learning we are learning from past examples to predict future values. Restaurant tips by Order Type 0 80 Total bill amount 0 10 Tip amount Pick-up Delivery
and seeing if it naturally falls into groups Example Model: Clustering Is this employee on the “fast-track” or not? Income vs Job tenure Unsupervised learning implies the data is not labeled 0 20 Years at company 0 60 Income
it was trained on • Understands distribution of data and how likely a given example is • Predict next word in a sequence Discriminative • Used to classify or predict • Typically trained on a dataset of labeled data • Learns the relationship between the features of the data points and the labels Deep Learning Model Types
be used for different tasks The fine-tune process requires minimal filed data The performance is continuously growing with more data and parameters 01 03 02
AI more helpful for everyone State-of-the-art, multimodal capabilities Highly optimized while preserving choice Built with responsibility and safety at the core
Best model for general performance and scaling across a wide range of tasks Gemini Nano Gemini Pro Largest and most capable model for highly complex tasks Gemini Ultra New lightweight model, optimized for speed and efficiency Gemini Flash
Artificial Intelligence that creates new content based on what it has learned from existing content. • The process of learning from existing content is called training and results in the creation of a statistical model. • When given a prompt, GenAI uses this statistical model to predict what an expected response might be–and this generates new content.
patterns in language through training data. Then, given some text, they predict what comes next. Generative image models produce new images using techniques like diffusion. Then, given a prompt or related imagery, they transform random noise into images or generate images from prompts. Generative image models
speech Output: Image Image Generation Output: Text Translation Summarization Question Answering Grammar Correction Video Generation Types of Generative AI Based on Data
can help you with that. Here are some of the most popular spreads to put on a peanut butter sandwich: • Jelly or jam: Strawberry, grape, raspberry, apricot, peach, blackberry, blueberry, etc. Gemini
a question that has been pondered by philosophers and theologians for centuries. There is no one answer that will satisfy everyone, but some possible answers include: To find happiness and fulfillment. To make a difference in the world. To learn and grow as a person. To connect with others and build relationships. Gemini
Billions of parameters • Unsupervised learning Encoding Component How’s it going? Decoding Component I’m doing alright, thanks for asking. How are you? Input Output Generative Pre-Trained Transformer Model Transformer
of the output. Large Language Model Prompting e.g. Output Summarization Keyword extraction Writing Summarize a text of X Generate a poem in the style of X Give me a list of keywords based on semantic similarity for X Input
input and produce text output. These models are trained to learn the mapping between a pair of texts (e.g. translation from one language to another). text-to-text Classification Summarization Translation (Re)Search Extraction Clustering Content editing / rewriting
are trained on a large set of images, each captioned with a short text description. Diffusion is one method used to achieve this. text-to-image Image editing Applications
a video representation from text input. The input text can be anything from a single sentence to a full script, and the output is a video that corresponds to the input text. Similarly Text-to-3D models generate three-dimensional objects that correspond to a user’s text description (for use in games or other 3D worlds). text-to-video Video editing text-to-3D Game assets
perform a specific task or action based on text input. This task can be a wide range of actions such as answering a question, performing a search, making a prediction, or taking some sort of action. For example, a text-to-task model could be trained to navigate web UI or make changes to a doc through the GUI. text-to-task Virtual assistants Automation
Design Voice Synthesis Video editing / generation 3D models / scenes Biology & chemistry Support (chat / email) Sales (email) Marketing (content) Note taking Other Text to SQL Code documentation Code generation Media / Advertising Consumer / Social Image generation Audio Music RPA Gaming Application layer Text Code Image Speech Video 3D Other
Expenses are 60,000 dollars. How much is net profit? Net profit is calculated by subtracting expenses from sales. In this case, sales are $100,000 and expenses are $60,000. Therefore, net profit is $40,000. Net profit is the amount of money that a company has left after all of its expenses have been paid. It is an important measure of a company’s financial health. A high net profit indicated that a [...]
New order requires 8,000 units. How many units do I need to fill to complete the order? To complete the order, you need to fill 8,000 units - 6,000 units = 2,000 units. You can also subtract the inventory on hand from the new order to get the number of units you need to fill: 8,000 units - 6,000 units = 2,000 units.
geographic regions. How many sensors do we have on average in each region? There are 1,000 sensors in ten geographic regions, so there are 1,000 / 10 = 100 sensors on average in each region. However, it is important to note that this is just an average. The number of sensors in each region may vary depending on the size and needs of a region.
context passed to a language model to achieve a desired task. Prompt engineering is the practice of developing and optimizing prompts to efficiently use language models for a variety of applications. Prompt Engineering Prompt Design
word (technically token) based on the language in the training data. Instruction Tuned | Trained to predict a response to the instructions given in the input. Dialog Tuned | Trained to have a dialog by predicting the next response. There are 3 main kinds of LLM, each needs prompting in a different way. The first two are easily confused and give very different outputs.
that LLMs work in. Generic language model - A next word predictor... The cat sat on a it ... the ... ... L L M The vector representation of the input token The vector representation for the next output token Most likely next word Next most likely next word Less likely next words
word (technically token) based on the language in the training data. Instruction Tuned | Trained to predict a response to the instructions given in the input. Dialog Tuned | Trained to have a dialog by predicting the next response. There are 3 main kinds of LLM, each needs prompting in a different way.
Summarization Keyword extraction Writing Summarize a text of X Generate a poem in the style of X Give me a list of keywords based on semantic similarity for X Input
word (technically token) based on the language in the training data. Instruction Tuned | Trained to predict a response to the instructions given in the input. Dialog Tuned | Trained to have a dialog by predicting the next response. There are 3 main kinds of LLM, each needs prompting in a different way.
of instruction tuned where requests are typically framed as questions to a chat bot. Dialog tuning is a further specialization of instruction tuning that is expected to be in the context of a longer back and forth conversation, and typically works better with natural question-like phrasings. Prompt examples [User] Is the comment "do you like the weather?" ok or toxic? [Bot] ok. [User] can you briefly say why? [Bot] Model Output It's just a question about the weather, people are not usually upset by that.
right answer when they first output text that explains the reason for the answer. Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? A: Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? A: Let's think this through step by step. The model is less likely to get the correct answer directly. Now the output is more likely to end with the correct answer. A: Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11. Prompt examples
new domain or set of custom use cases by training the model on new data. For example, we may collect training data and “tune” the LLM specifically for the legal or medical domain. Tuning
Image analysis/labelling Risk stratification Finding similar patients Millions of EHRs Tasks adoption Medical Foundation Model Natural language interactions
without duplicating the model. The base model itself is not altered. Instead, a small number of add-on layers are tuned, which can be swapped in and out at inference time. One of the easiest Parameter Efficient Tuning Methods. Parameter-Efficient Tuning Methods (PETM) Prompt Tuning More efficient methods of tuning
with cloud inference. Use only for prototyping Google AI Edge SDK Google AI client SDK Vertex AI SDK for Firebase. Recommended for production upon stable release Vertex AI SDK Backend integration with cloud inference. Recommended for production Vertex AI platform Android integration with on-device inference. Experimental, use with caution! MediaPipe LLM inference
API key in Google AI Studio Import the dependencies for the Google AI client SDK Integrate Gemini Pro/Flash to your Kotlin code One Two Three Four Integration in 4 steps
in the response. 100 tokens correspond to roughly 60-80 words Controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a more deterministic or less open-ended response, while higher temperatures can lead to more diverse or creative results Max output tokens Temperature Parameter changes how the model selects tokens for output. For each token selection step, the topK tokens with the highest probabilities are sampled. Tokens are then further filtered based on topP with the final token selected using temperature sampling topK / topP Model Parameters
that is rude, disrespectful, or profane Harassment Hate speech Contains references to sexual acts or other lewd content Sexually explicit Promotes, facilitates, or encourages harmful acts Dangerous Safety Settings
mobile device, 2GB+ 2. Still require to have proper prompt and fine tune 3. Runs on (very) limited devices like S24, Samsung Flip, S23, Pixel 8 (still in experimental, mediapipe) 4. The inference duration to get the result is like 5s to 8s in a average
for the complex task is still required ML Engineer and Backend Team to help 2. Either it can be main feature or secondary feature or even both 3. On Device Gen AI still costly to do, wait until the SDK of it is officially released! 4. Give it try to Gemini API and Gemma!