Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DevCoach 158 : ML Android | Buat Aplikasi Andro...

Nad
July 07, 2024
110

DevCoach 158 : ML Android | Buat Aplikasi Androidmu Naik Kelas dengan Generative AI

Nad

July 07, 2024
Tweet

Transcript

  1. What is Gen AI and how it works? Difference between

    ML & AI Gen AI Model Types & Applications Gen AI in Android In this session, you learn to...
  2. We are in an AI-driven revolution Source: AI: Recent Trends

    and Applications, Emerging Communication and Computing Steam Power 1784 Electricity 1870 Information Technology 1969 Artificial Intelligence Today AI
  3. AI is the theory and development of computer systems able

    to perform tasks normally requiring human intelligence.
  4. Supervised learning implies the data is already labeled In supervised

    learning we are learning from past examples to predict future values. Restaurant tips by Order Type 0 80 Total bill amount 0 10 Tip amount Pick-up Delivery
  5. Unsupervised problems are all about looking at the raw data,

    and seeing if it naturally falls into groups Example Model: Clustering Is this employee on the “fast-track” or not? Income vs Job tenure Unsupervised learning implies the data is not labeled 0 20 Years at company 0 60 Income
  6. Supervised learning Input data: x Model Error Training dataset Model

    update Unsupervised learning Input data: x Model Predict output (ŷ) Compare Expected output (y) Generated example
  7. Deep learning uses Artificial Neural Networks - allowing them to

    process more complex patterns than traditional machine learning.
  8. Input layer Hidden layer X1 1 X1 2 X1 3

    X1 [...] X1 [...] X1 [...] X1 [...] X1 [...] X1 [...] X1 [...] X1 [...] X1 [...] X1 3698 X1 3699 X1 370 0 X1 [...] X2 1 X2 2 X2 3 X2 [...] X2 8 Output layer X2 [...]
  9. ML Deep learning Generative AI is a subset of Deep

    Learning Generative AI Deep Learning
  10. ML Deep learning Large Language Models (LLMs) are also a

    subset of Deep Learning Language Large Models Deep Learning
  11. ML Deep learning Large Language Models (LLMs) also intersects with

    Generative AI Language Large Models Deep Learning Generative AI
  12. Generative • Generates new data that is similar to data

    it was trained on • Understands distribution of data and how likely a given example is • Predict next word in a sequence Discriminative • Used to classify or predict • Typically trained on a dataset of labeled data • Learns the relationship between the features of the data points and the labels Deep Learning Model Types
  13. New content Input Output Unstructured Content GenAI Model Learns patterns

    in unstructured content Label Output Predictive ML Model Learns relationship between data and label Labels Data
  14. Number Discrete Class Probability Not GenAI when y is a:

    Natural language Image Audio Is GenAI when y is: ML System Input Output
  15.  y = f ( x ) Model Output Model Input data Number Discrete

    Class Probability Not GenAI when y is a: Natural language Image Audio Is GenAI when y is:
  16. Text Generation Code Generation Image Generation Generate new content Generate

    new content Generate new content Foundation Model Gen AI Supervised, Semi-Supervised & Unsupervised Learning Training code Labeled data Unlabeled data
  17. Large language models Large Large training dataset Large number of

    parameters General purpose Commonality of human languages Resource restriction Pre-trained and fine-tuned
  18. Benefits of using large language models A single model can

    be used for different tasks The fine-tune process requires minimal filed data The performance is continuously growing with more data and parameters 01 03 02
  19. Gemini marks the next phase on our journey to making

    AI more helpful for everyone State-of-the-art, multimodal capabilities Highly optimized while preserving choice Built with responsibility and safety at the core
  20. Most efficient model for on-device tasks embed in Android Devices

    Best model for general performance and scaling across a wide range of tasks Gemini Nano Gemini Pro Largest and most capable model for highly complex tasks Gemini Ultra New lightweight model, optimized for speed and efficiency Gemini Flash
  21. This is a cat. This is not a cat. Cat:

    type: animal legs: 4 ears: 2 fur: yes likes: yarn, catnip (etc …) Traditional programming Wave of neural netw
  22. Is this a cat? This is a cat. This is

    not a cat. Wave of neural networks | ~2012 animal 4 2 es yarn, catnip al programming Go read this huge pile of books. Generative la
  23. So, you’ve learned about cats and millions of other concepts

    Go read this huge pile of books. A cat is … What’s a cat? s a cat? Generative language models | LaMDA, PaLM, Gemini, GPT, etc. 12
  24. What is Generative AI? • GenAI is a type of

    Artificial Intelligence that creates new content based on what it has learned from existing content. • The process of learning from existing content is called training and results in the creation of a statistical model. • When given a prompt, GenAI uses this statistical model to predict what an expected response might be–and this generates new content.
  25. Generative language models Generative Models Generative language models learn about

    patterns in language through training data. Then, given some text, they predict what comes next. Generative image models produce new images using techniques like diffusion. Then, given a prompt or related imagery, they transform random noise into images or generate images from prompts. Generative image models
  26. Input: Image Types of Generative AI Based on Data Output:

    Video Animation Output: Image Super Resolution Output: Text Image Captioning Visual Question Answering Image Search Image Completion
  27. Input: Text Output: Decisions Play Games Output: Audio Text To

    speech Output: Image Image Generation Output: Text Translation Summarization Question Answering Grammar Correction Video Generation Types of Generative AI Based on Data
  28. Generative language models learn about patterns in language through training

    data. Then, given some text, they predict what comes next.
  29. I’m making a sandwich with peanut butter and jelly. jam.

    banana. mayonnaise. fluff. jelly time. Nutella.
  30. I’m making a sandwich with peanut butter and Sure I

    can help you with that. Here are some of the most popular spreads to put on a peanut butter sandwich: • Jelly or jam: Strawberry, grape, raspberry, apricot, peach, blackberry, blueberry, etc. Gemini
  31. The meaning of life is The meaning of life is

    a question that has been pondered by philosophers and theologians for centuries. There is no one answer that will satisfy everyone, but some possible answers include: To find happiness and fulfillment. To make a difference in the world. To learn and grow as a person. To connect with others and build relationships. Gemini
  32. How it Works Pre-Training: • Large amount of Data •

    Billions of parameters • Unsupervised learning Encoding Component How’s it going? Decoding Component I’m doing alright, thanks for asking. How are you? Input Output Generative Pre-Trained Transformer Model Transformer
  33. Hallucinations are words or phrases that are generated by the

    model that are often nonsensical or grammatically incorrect.
  34. Hallucinations The model is trained on noisy or dirty data

    The model is not trained on enough data The model is not given enough context The model is not given enough constraints Challenges
  35. Prompt Design: the quality of the input determines the quality

    of the output. Large Language Model Prompting e.g. Output Summarization Keyword extraction Writing Summarize a text of X Generate a poem in the style of X Give me a list of keywords based on semantic similarity for X Input
  36. Model Types Applications Generation Text-to-text models take a natural language

    input and produce text output. These models are trained to learn the mapping between a pair of texts (e.g. translation from one language to another). text-to-text Classification Summarization Translation (Re)Search Extraction Clustering Content editing / rewriting
  37. Model Types Image generation Text-to-image models are relatively new and

    are trained on a large set of images, each captioned with a short text description. Diffusion is one method used to achieve this. text-to-image Image editing Applications
  38. Model Types Applications Video generation Text-to-video models aim to generate

    a video representation from text input. The input text can be anything from a single sentence to a full script, and the output is a video that corresponds to the input text. Similarly Text-to-3D models generate three-dimensional objects that correspond to a user’s text description (for use in games or other 3D worlds). text-to-video Video editing text-to-3D Game assets
  39. Model Types Applications Software agents Text-to-task models are trained to

    perform a specific task or action based on text input. This task can be a wide range of actions such as answering a question, performing a search, making a prediction, or taking some sort of action. For example, a text-to-task model could be trained to navigate web UI or make changes to a doc through the GUI. text-to-task Virtual assistants Automation
  40. Tasks Question answering Sentiment analysis Information extraction Image captioning Object

    recognition Instruction following Training Foundation Model Adaption Data Text Image Speech Structured data 3D signals
  41. The generative AI Application Landscape General writing Web app builders

    Design Voice Synthesis Video editing / generation 3D models / scenes Biology & chemistry Support (chat  / email) Sales (email) Marketing (content) Note taking Other Text to SQL Code documentation Code generation Media  /  Advertising Consumer /  Social Image generation Audio Music RPA Gaming Application layer Text Code Image Speech Video 3D Other
  42. Gemini | Question 1 This year’s sales are 100,000 dollars.

    Expenses are 60,000 dollars. How much is net profit? Net profit is calculated by subtracting expenses from sales. In this case, sales are $100,000 and expenses are $60,000. Therefore, net profit is $40,000. Net profit is the amount of money that a company has left after all of its expenses have been paid. It is an important measure of a company’s financial health. A high net profit indicated that a [...]
  43. Gemini | Question 2 Inventory on hand is 6,000 units.

    New order requires 8,000 units. How many units do I need to fill to complete the order? To complete the order, you need to fill 8,000 units - 6,000 units = 2,000 units. You can also subtract the inventory on hand from the new order to get the number of units you need to fill: 8,000 units - 6,000 units = 2,000 units.
  44. Gemini | Question 3 We have 1,000 sensors in ten

    geographic regions. How many sensors do we have on average in each region? There are 1,000 sensors in ten geographic regions, so there are 1,000 / 10 = 100 sensors on average in each region. However, it is important to note that this is just an average. The number of sensors in each region may vary depending on the size and needs of a region.
  45. Enter the prompt here Prompt design is the process of

    creating prompts that elicit the desired response from a language model.
  46. What are Prompts and Prompt Engineering? Prompts involve instructions and

    context passed to a language model to achieve a desired task. Prompt engineering is the practice of developing and optimizing prompts to efficiently use language models for a variety of applications. Prompt Engineering Prompt Design
  47. Generic (or Raw) Language Models | These predict the next

    word (technically token) based on the language in the training data. Instruction Tuned | Trained to predict a response to the instructions given in the input. Dialog Tuned | Trained to have a dialog by predicting the next response. There are 3 main kinds of LLM, each needs prompting in a different way. The first two are easily confused and give very different outputs.
  48. Token is a part of a word, the atomic unit

    that LLMs work in. Generic language model - A next word predictor... The cat sat on a it ... the ... ... L L M The vector representation of the input token The vector representation for the next output token Most likely next word Next most likely next word Less likely next words
  49. Generic (or Raw) Language Models | These predict the next

    word (technically token) based on the language in the training data. Instruction Tuned | Trained to predict a response to the instructions given in the input. Dialog Tuned | Trained to have a dialog by predicting the next response. There are 3 main kinds of LLM, each needs prompting in a different way.
  50. Instruction Tuned language model Large Language Model Prompting e.g. Output

    Summarization Keyword extraction Writing Summarize a text of X Generate a poem in the style of X Give me a list of keywords based on semantic similarity for X Input
  51. Elements of the Prompt Context Classify the text into neutral,

    negative or positive Text: I think the food was okay. Sentiment: Instructions Input data Output indicator
  52. Generic (or Raw) Language Models | These predict the next

    word (technically token) based on the language in the training data. Instruction Tuned | Trained to predict a response to the instructions given in the input. Dialog Tuned | Trained to have a dialog by predicting the next response. There are 3 main kinds of LLM, each needs prompting in a different way.
  53. Dialog Tuned Language Models Dialog-tuned models are a special case

    of instruction tuned where requests are typically framed as questions to a chat bot. Dialog tuning is a further specialization of instruction tuning that is expected to be in the context of a longer back and forth conversation, and typically works better with natural question-like phrasings. Prompt examples [User] Is the comment "do you like the weather?" ok or toxic? [Bot] ok. [User] can you briefly say why? [Bot] Model Output It's just a question about the weather, people are not usually upset by that.
  54. Chain of Thought Reasoning Models are better at getting the

    right answer when they first output text that explains the reason for the answer. Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? A: Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? A: Let's think this through step by step. The model is less likely to get the correct answer directly. Now the output is more likely to end with the correct answer. A: Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11. Prompt examples
  55. A model that can do everything has practical limitations. Observation

    Task-specific tuning can make LLMs more reliable.
  56. Gemini Gemini The process of adapting a model to a

    new domain or set of custom use cases by training the model on new data. For example, we may collect training data and “tune” the LLM specifically for the legal or medical domain. Tuning
  57. Bring your own dataset of and retrain the model by

    tuning every weight in the LLM. This requires a big training job (like really big) and hosting your own fine-tuned model. Fine tuning
  58. Healthcare data Reusable components Human-AI collaboration Question answering Chart summarizing

    Image analysis/labelling Risk stratification Finding similar patients Millions of EHRs Tasks adoption Medical Foundation Model Natural language interactions
  59. Fine tuning is expensive and not realistic in many cases.

    Observation Are there more efficient methods of tuning?
  60. Methods for tuning an LLM on your own custom data

    without duplicating the model. The base model itself is not altered. Instead, a small number of add-on layers are tuned, which can be swapped in and out at inference time. One of the easiest Parameter Efficient Tuning Methods. Parameter-Efficient Tuning Methods (PETM) Prompt Tuning More efficient methods of tuning
  61. Android integration with on-device inference. Recommended for production Android integration

    with cloud inference. Use only for prototyping Google AI Edge SDK Google AI client SDK Vertex AI SDK for Firebase. Recommended for production upon stable release Vertex AI SDK Backend integration with cloud inference. Recommended for production Vertex AI platform Android integration with on-device inference. Experimental, use with caution! MediaPipe LLM inference
  62. Google AI Client SDK Lets you to integrate the Gemini

    API to use Gemini Pro models in your Android app. A free-of-charge tier lets you experiment at no cost
  63. #geminiapi Prototype your prompts in Google AI Studio Get your

    API key in Google AI Studio Import the dependencies for the Google AI client SDK Integrate Gemini Pro/Flash to your Kotlin code One Two Three Four Integration in 4 steps
  64. Google AI Studio The fastest way to start building with

    Gemini, our next generation multimodal generative AI model aistudio.google.com
  65. val model = GenerativeModel( modelName = "gemini-1.5-flash-latest", apiKey = BuildConfig.geminiApiKey,

    generationConfig = generationConfig { temperature = 0.7f topK = 32 topP = 1f maxOutputTokens = 8192 }, safetySettings = listOf( SafetySetting(HarmCategory.HARASSMENT, BlockThreshold.MEDIUM_AND_ABOVE), SafetySetting(HarmCategory.HATE_SPEECH, BlockThreshold.MEDIUM_AND_ABOVE), SafetySetting(HarmCategory.SEXUALLY_EXPLICIT, BlockThreshold.MEDIUM_AND_ABOVE), SafetySetting(HarmCategory.DANGEROUS_CONTENT, BlockThreshold.MEDIUM_AND_ABOVE) ) )
  66. Specifies the maximum number of tokens that can be generated

    in the response. 100 tokens correspond to roughly 60-80 words Controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a more deterministic or less open-ended response, while higher temperatures can lead to more diverse or creative results Max output tokens Temperature Parameter changes how the model selects tokens for output. For each token selection step, the topK tokens with the highest probabilities are sampled. Tokens are then further filtered based on topP with the final token selected using temperature sampling topK / topP Model Parameters
  67. Creative writing or brainstorming Striking a balance Factual writing or

    tasks requiring high accuracy 0.9f 0.7f 0.5f Text Temperature
  68. val chat = model.startChat( history = listOf( content(role = "model")

    { text("Hai! Ada yang bisa Aku bantu?") } ) ) scope.launch { val response = chat.sendMessage("Saya ingin bertanya sesuatu kepada kamu...") }
  69. Negative or harmful comments targeting identity and/or protected attributes Content

    that is rude, disrespectful, or profane Harassment Hate speech Contains references to sexual acts or other lewd content Sexually explicit Promotes, facilitates, or encourages harmful acts Dangerous Safety Settings
  70. So unlucky the SDK is still in preview and limited

    to the particular device model and brand only
  71. Trade Offs : 1. The model is too big for

    mobile device, 2GB+ 2. Still require to have proper prompt and fine tune 3. Runs on (very) limited devices like S24, Samsung Flip, S23, Pixel 8 (still in experimental, mediapipe) 4. The inference duration to get the result is like 5s to 8s in a average
  72. Conclusion 1. Empower your android app with Gen AI integration,

    for the complex task is still required ML Engineer and Backend Team to help 2. Either it can be main feature or secondary feature or even both 3. On Device Gen AI still costly to do, wait until the SDK of it is officially released! 4. Give it try to Gemini API and Gemma!
  73. Feedback! Hadiah: • 1 Token Langganan Academy (30 Hari) *untuk

    pengisi feedback terpilih! dicoding.id/devcoachfeedback