DevCoach 158 : ML Android | Buat Aplikasi Androidmu Naik Kelas dengan Generative AI

Generative AI in Android Sidiq Permana Founder of NBS.DEV GDE
Android

What is Gen AI and how it works? Difference between
ML & AI Gen AI Model Types & Applications Gen AI in Android In this session, you learn to...

We are in an AI-driven revolution Source: AI: Recent Trends
and Applications, Emerging Communication and Computing Steam Power 1784 Electricity 1870 Information Technology 1969 Artificial Intelligence Today AI

What is the difference between ML & AI?

Artificial Intelligence Machine Learning

Artificial Intelligence is a discipline AI ML Deep Learning What
is Machine Learning? Machine Learning

AI is the theory and development of computer systems able
to perform tasks normally requiring human intelligence.

Artificial Intelligence Machine Learning is a discipline AI ML Deep
Learning is a subfield

ML gives computers the ability to learn without explicit programming.

Unsupervised ML models Supervised ML models

Supervised learning implies the data is already labeled In supervised
learning we are learning from past examples to predict future values. Restaurant tips by Order Type 0 80 Total bill amount 0 10 Tip amount Pick-up Delivery

Unsupervised problems are all about looking at the raw data,
and seeing if it naturally falls into groups Example Model: Clustering Is this employee on the “fast-track” or not? Income vs Job tenure Unsupervised learning implies the data is not labeled 0 20 Years at company 0 60 Income

Supervised learning Input data: x Model Error Training dataset Model
update Unsupervised learning Input data: x Model Predict output (ŷ) Compare Expected output (y) Generated example

Reinforcement learning Unsupervised learning Supervised learning Deep learning Machine Learning
AI ML Deep Learning Deep learning is a subset of ML

Deep learning uses Artificial Neural Networks - allowing them to
process more complex patterns than traditional machine learning.

Input layer Hidden layer X1 1 X1 2 X1 3
X1 [...] X1 [...] X1 [...] X1 [...] X1 [...] X1 [...] X1 [...] X1 [...] X1 [...] X1 3698 X1 3699 X1 370 0 X1 [...] X2 1 X2 2 X2 3 X2 [...] X2 8 Output layer X2 [...]

ML Deep learning Generative AI is a subset of Deep
Learning Generative AI Deep Learning

ML Deep learning Large Language Models (LLMs) are also a
subset of Deep Learning Language Large Models Deep Learning

ML Deep learning Large Language Models (LLMs) also intersects with
Generative AI Language Large Models Deep Learning Generative AI

Generative • Generates new data that is similar to data
it was trained on • Understands distribution of data and how likely a given example is • Predict next word in a sequence Discriminative • Used to classify or predict • Typically trained on a dataset of labeled data • Learns the relationship between the features of the data points and the labels Deep Learning Model Types

Discriminative technique Generative technique Classify Generate Discriminative model (classify as
a dog or a cat) Generative model (generate dog image) DOG

New content Input Output Unstructured Content GenAI Model Learns patterns
in unstructured content Label Output Predictive ML Model Learns relationship between data and label Labels Data

Number Discrete Class Probability Not GenAI when y is a:
Natural language Image Audio Is GenAI when y is: ML System Input Output

y = f ( x ) Model Output Model Input data Number Discrete
Class Probability Not GenAI when y is a: Natural language Image Audio Is GenAI when y is:

Predict Model Building Classical Supervised & Unsupervised Learning Training code
Labeled data Classify Cluster OR OR

Text Generation Code Generation Image Generation Generate new content Generate
new content Generate new content Foundation Model Gen AI Supervised, Semi-Supervised & Unsupervised Learning Training code Labeled data Unlabeled data

What are large language models?

Large, general-purpose language models can be pre-trained and then fine-tuned
for specific purposes

Large language models Large Large training dataset Large number of
parameters General purpose Commonality of human languages Resource restriction Pre-trained and fine-tuned

Benefits of using large language models A single model can
be used for different tasks The fine-tune process requires minimal filed data The performance is continuously growing with more data and parameters 01 03 02

Gemini marks the next phase on our journey to making
AI more helpful for everyone State-of-the-art, multimodal capabilities Highly optimized while preserving choice Built with responsibility and safety at the core

Most efficient model for on-device tasks embed in Android Devices
Best model for general performance and scaling across a wide range of tasks Gemini Nano Gemini Pro Largest and most capable model for highly complex tasks Gemini Ultra New lightweight model, optimized for speed and efficiency Gemini Flash

ai.google.dev/gemma

This is a cat. This is not a cat. Cat:
type: animal legs: 4 ears: 2 fur: yes likes: yarn, catnip (etc …) Traditional programming Wave of neural netw

Is this a cat? This is a cat. This is
not a cat. Wave of neural networks | ~2012 animal 4 2 es yarn, catnip al programming Go read this huge pile of books. Generative la

So, you’ve learned about cats and millions of other concepts
Go read this huge pile of books. A cat is … What’s a cat? s a cat? Generative language models | LaMDA, PaLM, Gemini, GPT, etc. 12

What is Generative AI? • GenAI is a type of
Artificial Intelligence that creates new content based on what it has learned from existing content. • The process of learning from existing content is called training and results in the creation of a statistical model. • When given a prompt, GenAI uses this statistical model to predict what an expected response might be–and this generates new content.

Generative language models Generative Models Generative language models learn about
patterns in language through training data. Then, given some text, they predict what comes next. Generative image models produce new images using techniques like diffusion. Then, given a prompt or related imagery, they transform random noise into images or generate images from prompts. Generative image models

Input: Image Types of Generative AI Based on Data Output:
Video Animation Output: Image Super Resolution Output: Text Image Captioning Visual Question Answering Image Search Image Completion

Input: Text Output: Decisions Play Games Output: Audio Text To
speech Output: Image Image Generation Output: Text Translation Summarization Question Answering Grammar Correction Video Generation Types of Generative AI Based on Data

Generative language models learn about patterns in language through training
data. Then, given some text, they predict what comes next.

I’m making a sandwich with peanut butter and

I’m making a sandwich with peanut butter and jelly. jam.
banana. mayonnaise. fluff. jelly time. Nutella.

I’m making a sandwich with peanut butter and Sure I
can help you with that. Here are some of the most popular spreads to put on a peanut butter sandwich: • Jelly or jam: Strawberry, grape, raspberry, apricot, peach, blackberry, blueberry, etc. Gemini

The meaning of life is The meaning of life is
a question that has been pondered by philosophers and theologians for centuries. There is no one answer that will satisfy everyone, but some possible answers include: To find happiness and fulfillment. To make a difference in the world. To learn and grow as a person. To connect with others and build relationships. Gemini

How it Works Pre-Training: • Large amount of Data •
Billions of parameters • Unsupervised learning Encoding Component How’s it going? Decoding Component I’m doing alright, thanks for asking. How are you? Input Output Generative Pre-Trained Transformer Model Transformer

Hallucinations are words or phrases that are generated by the
model that are often nonsensical or grammatically incorrect.

Hallucinations The model is trained on noisy or dirty data
The model is not trained on enough data The model is not given enough context The model is not given enough constraints Challenges

Prompt Design: the quality of the input determines the quality
of the output. Large Language Model Prompting e.g. Output Summarization Keyword extraction Writing Summarize a text of X Generate a poem in the style of X Give me a list of keywords based on semantic similarity for X Input

Forming a Database Inputting a Prompt Generating content

Model Types Applications Generation Text-to-text models take a natural language
input and produce text output. These models are trained to learn the mapping between a pair of texts (e.g. translation from one language to another). text-to-text Classification Summarization Translation (Re)Search Extraction Clustering Content editing / rewriting

Model Types Image generation Text-to-image models are relatively new and
are trained on a large set of images, each captioned with a short text description. Diffusion is one method used to achieve this. text-to-image Image editing Applications

Model Types Applications Video generation Text-to-video models aim to generate
a video representation from text input. The input text can be anything from a single sentence to a full script, and the output is a video that corresponds to the input text. Similarly Text-to-3D models generate three-dimensional objects that correspond to a user’s text description (for use in games or other 3D worlds). text-to-video Video editing text-to-3D Game assets

Model Types Applications Software agents Text-to-task models are trained to
perform a specific task or action based on text input. This task can be a wide range of actions such as answering a question, performing a search, making a prediction, or taking some sort of action. For example, a text-to-task model could be trained to navigate web UI or make changes to a doc through the GUI. text-to-task Virtual assistants Automation

Tasks Question answering Sentiment analysis Information extraction Image captioning Object
recognition Instruction following Training Foundation Model Adaption Data Text Image Speech Structured data 3D signals

The generative AI Application Landscape General writing Web app builders
Design Voice Synthesis Video editing / generation 3D models / scenes Biology & chemistry Support (chat / email) Sales (email) Marketing (content) Note taking Other Text to SQL Code documentation Code generation Media / Advertising Consumer / Social Image generation Audio Music RPA Gaming Application layer Text Code Image Speech Video 3D Other

Let’s look at three questions given to Gemini.

Gemini | Question 1 This year’s sales are 100,000 dollars.
Expenses are 60,000 dollars. How much is net profit? Net profit is calculated by subtracting expenses from sales. In this case, sales are $100,000 and expenses are $60,000. Therefore, net profit is $40,000. Net profit is the amount of money that a company has left after all of its expenses have been paid. It is an important measure of a company’s financial health. A high net profit indicated that a [...]

Gemini | Question 2 Inventory on hand is 6,000 units.
New order requires 8,000 units. How many units do I need to fill to complete the order? To complete the order, you need to fill 8,000 units - 6,000 units = 2,000 units. You can also subtract the inventory on hand from the new order to get the number of units you need to fill: 8,000 units - 6,000 units = 2,000 units.

Gemini | Question 3 We have 1,000 sensors in ten
geographic regions. How many sensors do we have on average in each region? There are 1,000 sensors in ten geographic regions, so there are 1,000 / 10 = 100 sensors on average in each region. However, it is important to note that this is just an average. The number of sensors in each region may vary depending on the size and needs of a region.

Enter the prompt here Prompt design is the process of
creating prompts that elicit the desired response from a language model.

What are Prompts and Prompt Engineering? Prompts involve instructions and
context passed to a language model to achieve a desired task. Prompt engineering is the practice of developing and optimizing prompts to efficiently use language models for a variety of applications. Prompt Engineering Prompt Design

Generic (or Raw) Language Models | These predict the next
word (technically token) based on the language in the training data. Instruction Tuned | Trained to predict a response to the instructions given in the input. Dialog Tuned | Trained to have a dialog by predicting the next response. There are 3 main kinds of LLM, each needs prompting in a different way. The first two are easily confused and give very different outputs.

Token is a part of a word, the atomic unit
that LLMs work in. Generic language model - A next word predictor... The cat sat on a it ... the ... ... L L M The vector representation of the input token The vector representation for the next output token Most likely next word Next most likely next word Less likely next words

word (technically token) based on the language in the training data. Instruction Tuned | Trained to predict a response to the instructions given in the input. Dialog Tuned | Trained to have a dialog by predicting the next response. There are 3 main kinds of LLM, each needs prompting in a different way.

Instruction Tuned language model Large Language Model Prompting e.g. Output
Summarization Keyword extraction Writing Summarize a text of X Generate a poem in the style of X Give me a list of keywords based on semantic similarity for X Input

Elements of the Prompt Context Classify the text into neutral,
negative or positive Text: I think the food was okay. Sentiment: Instructions Input data Output indicator

word (technically token) based on the language in the training data. Instruction Tuned | Trained to predict a response to the instructions given in the input. Dialog Tuned | Trained to have a dialog by predicting the next response. There are 3 main kinds of LLM, each needs prompting in a different way.

Dialog Tuned Language Models Dialog-tuned models are a special case
of instruction tuned where requests are typically framed as questions to a chat bot. Dialog tuning is a further specialization of instruction tuning that is expected to be in the context of a longer back and forth conversation, and typically works better with natural question-like phrasings. Prompt examples [User] Is the comment "do you like the weather?" ok or toxic? [Bot] ok. [User] can you briefly say why? [Bot] Model Output It's just a question about the weather, people are not usually upset by that.

Chain of Thought Reasoning Models are better at getting the
right answer when they first output text that explains the reason for the answer. Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? A: Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? A: Let's think this through step by step. The model is less likely to get the correct answer directly. Now the output is more likely to end with the correct answer. A: Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11. Prompt examples

A model that can do everything has practical limitations. Observation
Task-specific tuning can make LLMs more reliable.

Gemini Gemini The process of adapting a model to a
new domain or set of custom use cases by training the model on new data. For example, we may collect training data and “tune” the LLM specifically for the legal or medical domain. Tuning

Bring your own dataset of and retrain the model by
tuning every weight in the LLM. This requires a big training job (like really big) and hosting your own fine-tuned model. Fine tuning

Healthcare data Reusable components Human-AI collaboration Question answering Chart summarizing
Image analysis/labelling Risk stratification Finding similar patients Millions of EHRs Tasks adoption Medical Foundation Model Natural language interactions

Fine tuning is expensive and not realistic in many cases.
Observation Are there more efficient methods of tuning?

Methods for tuning an LLM on your own custom data
without duplicating the model. The base model itself is not altered. Instead, a small number of add-on layers are tuned, which can be swapped in and out at inference time. One of the easiest Parameter Efficient Tuning Methods. Parameter-Efficient Tuning Methods (PETM) Prompt Tuning More efficient methods of tuning

Generative AI in Android

On-Cloud On-Device Gemini Cloud API (Pro, Flash, Ultra) Gemini Nano
Gemma

d.android.com/ai

Android integration with on-device inference. Recommended for production Android integration
with cloud inference. Use only for prototyping Google AI Edge SDK Google AI client SDK Vertex AI SDK for Firebase. Recommended for production upon stable release Vertex AI SDK Backend integration with cloud inference. Recommended for production Vertex AI platform Android integration with on-device inference. Experimental, use with caution! MediaPipe LLM inference

Google AI Client SDK Lets you to integrate the Gemini
API to use Gemini Pro models in your Android app. A free-of-charge tier lets you experiment at no cost

#geminiapi Prototype your prompts in Google AI Studio Get your
API key in Google AI Studio Import the dependencies for the Google AI client SDK Integrate Gemini Pro/Flash to your Kotlin code One Two Three Four Integration in 4 steps

Google AI Studio The fastest way to start building with
Gemini, our next generation multimodal generative AI model aistudio.google.com

Get API key Generate API key aistudio.google.com #geminiapi

build.gradle.kts dependencies { [...] implementation("com.google.ai.client.generativeai:generativeai:X.X.X") }

val model = GenerativeModel( modelName = "gemini-1.5-flash-latest", apiKey = BuildConfig.geminiApiKey,
generationConfig = generationConfig { temperature = 0.7f topK = 32 topP = 1f maxOutputTokens = 8192 }, safetySettings = listOf( SafetySetting(HarmCategory.HARASSMENT, BlockThreshold.MEDIUM_AND_ABOVE), SafetySetting(HarmCategory.HATE_SPEECH, BlockThreshold.MEDIUM_AND_ABOVE), SafetySetting(HarmCategory.SEXUALLY_EXPLICIT, BlockThreshold.MEDIUM_AND_ABOVE), SafetySetting(HarmCategory.DANGEROUS_CONTENT, BlockThreshold.MEDIUM_AND_ABOVE) ) )

Specifies the maximum number of tokens that can be generated
in the response. 100 tokens correspond to roughly 60-80 words Controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a more deterministic or less open-ended response, while higher temperatures can lead to more diverse or creative results Max output tokens Temperature Parameter changes how the model selects tokens for output. For each token selection step, the topK tokens with the highest probabilities are sampled. Tokens are then further filtered based on topP with the final token selected using temperature sampling topK / topP Model Parameters

Creative writing or brainstorming Striking a balance Factual writing or
tasks requiring high accuracy 0.9f 0.7f 0.5f Text Temperature

val chat = model.startChat( history = listOf( content(role = "model")
{ text("Hai! Ada yang bisa Aku bantu?") } ) ) scope.launch { val response = chat.sendMessage("Saya ingin bertanya sesuatu kepada kamu...") }

Negative or harmful comments targeting identity and/or protected attributes Content
that is rude, disrespectful, or profane Harassment Hate speech Contains references to sexual acts or other lewd content Sexually explicit Promotes, facilitates, or encourages harmful acts Dangerous Safety Settings

Safety Settings You can adjust safety settings in Google AI
Studio, but you cannot turn them off

Generative AI On Device?

So unlucky the SDK is still in preview and limited
to the particular device model and brand only

Demo Gemini in Android

Adding Urgency Level of Report

How can we do this?

Match Prediction News Summary

How can we do this?

Modify prompt to get formatted result

Demo Gemma in Android

Deployed manually ⇒

Trade Offs : 1. The model is too big for
mobile device, 2GB+ 2. Still require to have proper prompt and fine tune 3. Runs on (very) limited devices like S24, Samsung Flip, S23, Pixel 8 (still in experimental, mediapipe) 4. The inference duration to get the result is like 5s to 8s in a average

Conclusion 1. Empower your android app with Gen AI integration,
for the complex task is still required ML Engineer and Backend Team to help 2. Either it can be main feature or secondary feature or even both 3. On Device Gen AI still costly to do, wait until the SDK of it is officially released! 4. Give it try to Gemini API and Gemma!

Thank you.

Feedback! Hadiah: • 1 Token Langganan Academy (30 Hari) *untuk
pengisi feedback terpilih! dicoding.id/devcoachfeedback

DevCoach 158 : ML Android | Buat Aplikasi Andro...

DevCoach 158 : ML Android | Buat Aplikasi Androidmu Naik Kelas dengan Generative AI

More Decks by Nad

Featured

Transcript