Slide 1

Slide 1 text

Discovering AI Models Laurent Picard @PicardParis AI for Everyone! September 18, 2024

Slide 2

Slide 2 text

Hey! I'm Laurent! Laurent Picard ‒ @PicardParis ○ Developer Advocate ‒ Google Cloud ○ Applied AI, Serverless, Python Previous lives ○ CTO, cofounder of Bookeen ○ Ebook pioneer (17 years) ○ Educational solutions

Slide 3

Slide 3 text

Any sufficiently advanced technology is indistinguishable from magic — Arthur C. Clarke “

Slide 4

Slide 4 text

@PicardParis What is machine learning (for me)? Data Information

Slide 5

Slide 5 text

@PicardParis What is machine learning? Artificial Intelligence (make machines "intelligent") Machine Learning (learn from data) Deep Learning (using neural networks) Generative AI (create content)

Slide 6

Slide 6 text

@PicardParis How does deep learning work? How Using many examples to find answers Result Solving problems without explicitly knowing the answer Origin Trying to mimic how (we think) our brain works

Slide 7

Slide 7 text

@PicardParis Why is machine learning now possible? Theory Data Computing ML

Slide 8

Slide 8 text

@PicardParis Google AI Milestones ai.google/ai-milestones

Slide 9

Slide 9 text

@PicardParis Focus on ML Focus on Dev ML APIs Ready-to-use models Model Tuning Customized models Machine Learning Data & neural networks Building blocks Four ways we can build with ML in 2024 Generative AI Generative models

Slide 10

Slide 10 text

01 Machine Learning APIs Ready-to-use models

Slide 11

Slide 11 text

@PicardParis Ready-to-use models Text Text Image Video Speech Text Video Intelligence API Speech-To-Text API Vision API Natural Language API Translation API Text-To-Speech API Info Translation Info Info Text Speech

Slide 12

Slide 12 text

@PicardParis Generative AI Prompt → Text - Chat - Summarization - Classification - Extraction - Writing/ideation Image → Info - Image captioning Image+Q → Info - Visual Q & A Text → App - Search - Chat - Recommendations - Agent Prompt → Code - Code generation - Code completion - Refactoring - Lang. conversion Vertex AI Prompt → Image - Image generation + Prompt → Image - Image editing Text Image Audio Video Documents Vertex AI Agent Builder Vertex AI Studio

Slide 13

Slide 13 text

@PicardParis Natural Language API Extract information from text

Slide 14

Slide 14 text

@PicardParis Syntax analysis Tolkien was a British writer, poet, philologist, and university professor who is best known as the author of the classic high-fantasy works The Hobbit, The Lord of the Rings, and The Silmarillion.

Slide 15

Slide 15 text

@PicardParis Tolkien was a British writer, poet, philologist, and university professor who is best known as the author of the classic high-fantasy works The Hobbit, The Lord of the Rings, and The Silmarillion. { "language": "en" } Syntax analysis

Slide 16

Slide 16 text

@PicardParis Entity detection Tolkien was a British writer, poet, philologist, and university professor who is best known as the author of the classic high-fantasy works The Hobbit, The Lord of the Rings, and The Silmarillion.

Slide 17

Slide 17 text

@PicardParis Entity detection Tolkien was a British writer, poet, philologist, and university professor who is best known as the author of the classic high-fantasy works The Hobbit, The Lord of the Rings, and The Silmarillion.

Slide 18

Slide 18 text

@PicardParis Entity detection Tolkien was a British writer, poet, philologist, and university professor who is best known as the author of the classic high-fantasy works The Hobbit, The Lord of the Rings, and The Silmarillion. { "name": "British", "type": "LOCATION", "metadata": { "mid": "/m/07ssc", "wikipedia_url": "https://en.wikipedia.org/wiki/United_Kingdom" } } { "name": "Tolkien", "type": "PERSON", "metadata": { "mid": "/m/041h0", "wikipedia_url": "https://en.wikipedia.org/wiki/J._R._R._Tolkien" } } { "name": "The Silmarillion", "type": "WORK_OF_ART", "metadata": { "mid": "/m/07c4l", "wikipedia_url": "https://en.wikipedia.org/wiki/The_Silmarillion" } }

Slide 19

Slide 19 text

@PicardParis Content classification Tolkien was a British writer, poet, philologist, and university professor who is best known as the author of the classic high-fantasy works The Hobbit, The Lord of the Rings, and The Silmarillion. { "categories": [ { "name": "/Books & Literature", "confidence": 0.97 }, { "name": "/People & Society/Subcultures…", "confidence": 0.66 }, { "name": "/Hobbies & Leisure", "confidence": 0.58 } ] }

Slide 20

Slide 20 text

@PicardParis Sentiment analysis 2 example reviews of “The Hobbit”: - Positive from the NYT (1938) - Negative from GoodReads

Slide 21

Slide 21 text

@PicardParis Text moderation Detect sensitive or harmful content by scoring 16 categories.

Slide 22

Slide 22 text

@PicardParis Client libraries from google.cloud import language from google.cloud.language import enums, types def analyze_text_sentiment(text): client = language.LanguageServiceClient() document = types.Document(content=text, type=enums.Document.Type.PLAIN_TEXT) response = client.analyze_sentiment(document=document) sentiment = response.document_sentiment results = [('text', text), ('score', sentiment.score), ('magnitude', sentiment.magnitude)] for k, v in results: print('{:10}: {}'.format(k, v)) Python package: pypi.org/project/google-cloud-language Tutorial code: codelabs.developers.google.com/codelabs/cloud-natural-language-python3

Slide 23

Slide 23 text

@PicardParis Translation API Translate text in 100+ languages

Slide 24

Slide 24 text

@PicardParis Translation API Translate Many Languages 100+ different languages, from Afrikaans to Zulu. Used in combination, this enables translation between thousands of language pairs. Language Detection Translation API can automatically identify languages with high accuracy. Simple Integration Easy to use Google REST API. No need to extract text from your document, just send it HTML documents and get back translated text. High Quality Translations High quality translations that push the boundary of Machine Translation. Updated constantly to seamlessly improve translations and introduce new languages and language pairs.

Slide 25

Slide 25 text

@PicardParis Switch to a neural translation model in 2016 Neural Network for Machine Translation, at Production Scale (ai.googleblog.com)

Slide 26

Slide 26 text

@PicardParis Models match empirical studies Exploring Massively Multilingual, Massive Neural Machine Translation (ai.googleblog.com)

Slide 27

Slide 27 text

@PicardParis Models keep improving over time Recent Advances in Google Translate (ai.googleblog.com)

Slide 28

Slide 28 text

@PicardParis Client libraries from google.cloud import translate def translate_text(target, text): """Translates text into the target language.""" translate_client = translate.Client() # Text can also be a sequence of strings, in which case this method # will return a sequence of results for each text. result = translate_client.translate(text, target_language=target) print('Text: {}'.format(result['input'])) print('Translation: {}'.format(result['translatedText'])) print('Detected source language: {}'.format(result['detectedSourceLanguage'])) Sample from Python open source client library github.com/GoogleCloudPlatform/python-docs-samples

Slide 29

Slide 29 text

@PicardParis Vision API Extract information from images

Slide 30

Slide 30 text

@PicardParis Computer vision before ML Photo by Shaun Jeffers: hobbitontours.com Edge detection with Sobel convolution filter

Slide 31

Slide 31 text

@PicardParis Label detection Photo by Shaun Jeffers: hobbitontours.com "labelAnnotations": [ { "description": "Nature", "mid": "/m/05h0n", "score": 0.9516123, }, { "description": "Flower", "mid": "/m/0c9ph5", "score": 0.91467637, }, { "description": "Garden", "mid": "/m/0bl0l", "score": 0.903375, }, … ]

Slide 32

Slide 32 text

@PicardParis Photo by Dominic Monaghan (Instagram) Object detection "localizedObjectAnnotations": [ { "boundingPoly": {…}, "mid": "/m/01g317", "name": "Person", "score": 0.90216154 }, { "boundingPoly": {…}, "mid": "/m/01g317", "name": "Person", "score": 0.88069034 }, { "boundingPoly": {…}, "mid": "/m/01g317", "name": "Person", "score": 0.86947715 }, … ]

Slide 33

Slide 33 text

@PicardParis Rendering by Elendil: www.zbrushcentral.com/printthread.php?t=45397 Face detection "faceAnnotations": [{ "detectionConfidence": 0.93634903, "boundingPoly": {…}, "fdBoundingPoly": {…}, "landmarkingConfidence": 0.18798567, "landmarks": [{ "type": "LEFT_EYE" "position": {…}, },…], "panAngle": -1.7626401, "rollAngle": 7.024975, "tiltAngle": 9.038818, "angerLikelihood": "LIKELY", "joyLikelihood": "VERY_UNLIKELY", "sorrowLikelihood": "VERY_UNLIKELY", "surpriseLikelihood": "VERY_UNLIKELY", "headwearLikelihood": "VERY_UNLIKELY", "blurredLikelihood": "VERY_UNLIKELY", "underExposedLikelihood": "VERY_UNLIKELY" }]

Slide 34

Slide 34 text

@PicardParis Screenshot from Goodreads: goodreads.com/quotes/4454 Text detection "fullTextAnnotation": { "text": " J.R.R. Tolkien > Quotes > Quotable Quote \"Three Rings for the Elven-kings under the… Seven for the Dwarf-lords in their halls of… Nine for Mortal Men, doomed to die, One for the Dark Lord on his dark throne In the Land of Mordor where the Shadows lie. One Ring to rule them all, One Ring to find… One Ring to bring them all and in the darkn… In the Land of Mordor where the Shadows lie.\" - J. R. R. Tolkien, The Lord of the Rings " }

Slide 35

Slide 35 text

@PicardParis Screenshot from Goodreads: goodreads.com/quotes/4454 Text detection "fullTextAnnotation": { "text": " J.R.R. Tolkien > Quotes > Quotable Quote \"Three Rings for the Elven-kings under the… Seven for the Dwarf-lords in their halls of… Nine for Mortal Men, doomed to die, One for the Dark Lord on his dark throne In the Land of Mordor where the Shadows lie. One Ring to rule them all, One Ring to find… One Ring to bring them all and in the darkn… In the Land of Mordor where the Shadows lie.\" - J. R. R. Tolkien, The Lord of the Rings " }

Slide 36

Slide 36 text

@PicardParis Tolkien handwriting: pinterest.com/pin/145311525456602832 Handwriting detection "fullTextAnnotation": { "text": " The Lord of the Rings. Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of… Nine for Mortal Men doomed to die, One for the Dark Lord on his dark throne In the Land of Mordor where the shadows lie. One Ring to rule them all, One Ring to find… One Ring to bring them all and in the shadows… In the Land of Mordor where the shadows lie\". " }

Slide 37

Slide 37 text

@PicardParis Landmark detection "landmarkAnnotations": [ { "boundingPoly": {…}, "description": "Hobbiton Movie Set", "locations": [ { "latLng": { "latitude": -37.8723441, "longitude": 175.6833613 } } ], "mid": "/m/012r3jqg", "score": 0.61243546 } ] Original photo by Shaun Jeffers: hobbitontours.com

Slide 38

Slide 38 text

@PicardParis Web entity detection and image matching "webDetection": { "bestGuessLabels": [ { "label": "jrr tolkien", "languageCode": "es" } ], "webEntities": [ { "entityId": "/m/041h0", "score": 14.976, "description": "J. R. R. Tolkien" },… ], "partialMatchingImages": [ { "url": "http://e00-elmundo.uecdn.es/…jpg" },… ], "pagesWithMatchingImages": […], "visuallySimilarImages": […] } Photo by Bill Potter: elmundo.es/cultura/2017/08/11/598c81b6e2704ebf238b469e.html

Slide 39

Slide 39 text

@PicardParis Client libraries from google.cloud import vision uri_base = 'gs://cloud-vision-codelab' pics = ('face_surprise.jpg', 'face_no_surprise.png') client = vision.ImageAnnotatorClient() image = vision.Image() for pic in pics: image.source.image_uri = f'{uri_base}/{pic}' response = client.face_detection(image=image) for face in response.face_annotations: likelihood = vision.Likelihood(face.surprise_likelihood) vertices = [f'({v.x},{v.y})' for v in face.bounding_poly.vertices] print(f'Face surprised: {likelihood.name}') print(f'Face bounds: {",".join(vertices)}') Python package: pypi.org/project/google-cloud-vision Tutorial code: codelabs.developers.google.com/codelabs/cloud-vision-api-python

Slide 40

Slide 40 text

Demo – Vision API

Slide 41

Slide 41 text

@PicardParis Video Intelligence API Extract information from videos

Slide 42

Slide 42 text

@PicardParis Video Intelligence API Label Detection Detect entities within the video, such as "dog", "flower" or "car". Enable Video Search Search your video catalog the same way you search text documents. Insights from Videos Extract actionable insights from video files without requiring any machine learning or computer vision knowledge. More… Detect sequences Detect and track objects Detect explicit content Transcribe speech + OCR, logo, face, person detection, pose estimation…

Slide 43

Slide 43 text

Demo - Video Intelligence API

Slide 44

Slide 44 text

Demo – Video Intelligence API

Slide 45

Slide 45 text

@PicardParis Client libraries from google.cloud import videointelligence from google.cloud.videointelligence import enums, types def track_objects(video_uri, segments=None): video_client = videointelligence.VideoIntelligenceServiceClient() features = [enums.Feature.OBJECT_TRACKING] context = types.VideoContext(segments=segments) print(f'Processing video "{video_uri}"...') operation = video_client.annotate_video(input_uri=video_uri, features=features, video_context=context) return operation.result() Python package: pypi.org/project/google-cloud-videointelligence Tutorial code: codelabs.developers.google.com/codelabs/cloud-video-intelligence-python3

Slide 46

Slide 46 text

@PicardParis Speech-to-Text API Convert speech to text in 125 languages

Slide 47

Slide 47 text

@PicardParis Speech-to-Text API Speech Recognition Recognizes 125 languages & variants. Powered by deep learning neural networking to power your applications. Real-Time Results Can stream text results, returning partial recognition results as they become available. Can also be run on buffered or archived audio files. Noise Robustness No need for signal processing or noise cancellation before calling API. Can handle noisy audio from a variety of environments. More… Customized recognition Word timestamps Auto-punctuation Profanity filter Spoken punctuation Spoken emojis … (Preview) Language auto-detection Multiple speaker detection Word-level confidence …

Slide 48

Slide 48 text

@PicardParis Speech timestamps Search for text within your audio "transcript": "Hello world…", "confidence": 0.96596134, "words": [ { "startTime": "1.400s", "endTime": "1.800s", "word": "Hello" }, { "startTime": "1.800s", "endTime": "2.300s", "word": "world" }, … ]

Slide 49

Slide 49 text

@PicardParis Client libraries from google.cloud import speech_v1 as speech def speech_to_text(config, audio): client = speech.SpeechClient() response = client.recognize(config, audio) config = {'language_code': 'fr-FR', 'enable_automatic_punctuation': True, 'enable_word_time_offsets': True} audio = {'uri': 'gs://cloud-samples-data/speech/corbeau_renard.flac'} speech_to_text(config, audio) """ Transcript: Maître corbeau sur un arbre perché tenait en son bec un fromage... Confidence: 93% """ Python package: pypi.org/project/google-cloud-speech Tutorial code: codelabs.developers.google.com/codelabs/cloud-speech-text-python3

Slide 50

Slide 50 text

@PicardParis Text-to-Speech API Generate natural speech

Slide 51

Slide 51 text

@PicardParis WaveNet natural voices, by Deepmind https://deepmind.com/blog/wavenet-generative-model-raw-audio https://deepmind.com/blog/high-fidelity-speech-synthesis-wavenet

Slide 52

Slide 52 text

@PicardParis Which one is the original recording?

Slide 53

Slide 53 text

Demo – Live Search & Response

Slide 54

Slide 54 text

@PicardParis Client libraries from google.cloud import texttospeech from google.cloud.texttospeech import enums, types def text_to_wav(voice_name, text): language_code = "-".join(voice_name.split("-")[:2]) input = types.SynthesisInput(text=text) voice = types.VoiceSelectionParams(language_code=language_code, name=voice_name) audio_config = types.AudioConfig(audio_encoding=enums.AudioEncoding.LINEAR16) client = texttospeech.TextToSpeechClient() response = client.synthesize_speech(input, voice, audio_config) save_to_wav(f"{language_code}.wav", response.audio_content) text_to_wav("en-AU-Wavenet-A", "What is the temperature in Sydney?") text_to_wav("en-GB-Wavenet-B", "What is the temperature in London?") text_to_wav("en-IN-Wavenet-C", "What is the temperature in Delhi?") Python package: pypi.org/project/google-cloud-texttospeech Tutorial code: codelabs.developers.google.com/codelabs/cloud-text-speech-python3

Slide 55

Slide 55 text

@PicardParis Prompt gallery in Vertex AI

Slide 56

Slide 56 text

Demo – Prompt gallery

Slide 57

Slide 57 text

02 Model Tuning (& AutoML) Build your custom model with no expertise

Slide 58

Slide 58 text

@PicardParis Generic results with the Vision API

Slide 59

Slide 59 text

@PicardParis More specific results? CIRRUS ALTOCUMULUS

Slide 60

Slide 60 text

@PicardParis AutoML AutoML Train Deploy Serve Your training data Your custom model with a REST API Your custom edge model TF Lite mobile TF.js browser Container anywhere

Slide 61

Slide 61 text

@PicardParis Dataset

Slide 62

Slide 62 text

@PicardParis Training

Slide 63

Slide 63 text

@PicardParis Evaluating

Slide 64

Slide 64 text

@PicardParis Serving

Slide 65

Slide 65 text

@PicardParis Auto-generate a custom model from your data Image Text Text Video Structured Data AutoML Vision AutoML Natural Language AutoML Translation AutoML Video Intelligence AutoML Tables Custom - Classification - Object Detection - Pix Segmentation Custom - Classification - Shot Detection - Obj. Detect./Track. Custom - Classification - Entity Extraction - Sentiment Analysis Custom Translation Custom - Classification - Metrics Prediction

Slide 66

Slide 66 text

@PicardParis Text classification (single-label) Text classification (multi-label) Text entity extraction Text sentiment analysis AutoML in Vertex AI Datasets Image classification (single-label) Image classification (multi-label) Image object detection Video classification Video action recognition Video object tracking Regression/classification Image segmentation + Custom translation models with AutoML Translation Forecasting

Slide 67

Slide 67 text

@PicardParis Model tuning in Vertex AI Studio

Slide 68

Slide 68 text

@PicardParis What are your emotions? Ready-to-use model Vision API 😃 Joy 😮 Surprise 😢 Sorrow 😠 Anger I want to detect faces + general emotions Custom model AutoML Vision 😛 Tongue out 🥱 Yawning 😴 Sleeping I want to detect new custom expressions

Slide 69

Slide 69 text

Stache Club demo serverless architecture Source Selfies Cloud Storage Stache Club App App Engine Stache Club Selfies Cloud Storage Face Detection Vision API Custom Detection AutoML Vision Selfie Processing Cloud Functions User Web request Event trigger 1. Upload a selfie 2. Function is automatically triggered 3. Function gets insights from ML APIs 4. Function uploads result image 1 2 3 4 Admin

Slide 70

Slide 70 text

⁞ Demo

Slide 71

Slide 71 text

@PicardParis Evaluation: results vs expectations Results returned by model Results we expect Results we don't expect Results not returned by model Model positives ← Model negatives ← True positives False negatives True negatives False positives

Slide 72

Slide 72 text

@PicardParis Model precision Precision = True + True + False + Precision can be seen as a measure of exactness or quality. High precision means that the model returns substantially more expected results than unexpected ones.

Slide 73

Slide 73 text

@PicardParis Model recall Recall can be seen as a measure of completeness or quantity. High recall means that the model returns most of the expected results. Recall = True + True + False −

Slide 74

Slide 74 text

@PicardParis Learning to learn Models to identify optimal model architectures AutoML under the hood Transfer learning Build on existing models Hyperparameter auto-tuning Algorithm for finding the best hyperparameters for your model & data

Slide 75

Slide 75 text

@PicardParis Learning to learn: neural architecture search Controller: proposes ML models Train & evaluate models 20K times Iterate to find the most accurate model Layers Learning rate Research paper: bit.ly/nas-paper

Slide 76

Slide 76 text

@PicardParis Updated output using your training data Transfer learning Model trained on a lot of data Your data Hidden layers

Slide 77

Slide 77 text

@PicardParis Hyperparameter tuning ● Hyperparameters: any value which affects the accuracy of an algorithm, but is not directly learned by it ● HyperTune: Google-developed algorithm to find the best hyperparameter combinations for your model ● Available as a Cloud API: Vertex AI Vizier HyperParam #1 Objective Want to find this Not these HyperParam #2

Slide 78

Slide 78 text

@PicardParis Object detection prototype

Slide 79

Slide 79 text

03 More machine learning! From focusing on industry verticals… …to building neural networks

Slide 80

Slide 80 text

@PicardParis AI platforms & industry verticals ● Vertex AI DS+AutoML+MLOps+… ● Document AI OCR+HW+Tables+Forms Invoices+Receipts+… ● Dialogflow Build your chat bot ● Call Center AI

Slide 81

Slide 81 text

@PicardParis Form automation prototype with Document AI

Slide 82

Slide 82 text

@PicardParis Identity form autofiller with Document AI

Slide 83

Slide 83 text

Time to wrap up!

Slide 84

Slide 84 text

@PicardParis Focus on ML Focus on Dev ML APIs Ready-to-use models Model Tuning Customized models Machine Learning Data & neural networks How fast & easy is it to build a prototype? Generative AI Generative models hours days days, weeks… Time? hours none dataset dataset + NN + … Difficulty? prompt

Slide 85

Slide 85 text

Resources Ready-to-use machine learning models Cloud Vision API cloud.google.com/vision Cloud Video Intelligence API cloud.google.com/video-intelligence Cloud Natural Language API cloud.google.com/natural-language Cloud Translation API cloud.google.com/translation Cloud Speech-To-Text API cloud.google.com/speech-to-text Cloud Text-to-Speech API cloud.google.com/text-to-speech Use, customize, and deploy generative models Generative AI Studio cloud.google.com/generative-ai-studio Build your custom model with your own data without any expertise Cloud AutoML cloud.google.com/automl Build your model from scratch with deep learning expertise Vertex AI cloud.google.com/vertex-ai Extract structured information from documents Document AI cloud.google.com/document-ai

Slide 86

Slide 86 text

@PicardParis Python resources & articles Codelabs (g.co/codelabs) Using the Vision API Using the Video Intelligence API Using the Natural Language API Using the Translation API Using the Speech-to-Text API Using the Text-to-Speech API Inspirational articles (medium.com/@PicardParis) Summarizing videos in 300 lines of code Tracking video objects in 300 lines of code Face detection and processing in 300 lines of code Deploy a coloring page generator in minutes From pixels to information with Document AI Automate identity document processing Moderating text with the Natural Language API Deploying a Python serverless function in minutes

Slide 87

Slide 87 text

bit.ly/ml-comic Google AI Online Comic

Slide 88

Slide 88 text

How do LLMs work? ig.ft.com/generative-ai

Slide 89

Slide 89 text

Generative AI sample repository github.com/GoogleCloudPlatform/generative-ai

Slide 90

Slide 90 text

Generative AI in education …/gemini/use-cases/education/use_cases_for_education.ipynb

Slide 91

Slide 91 text

Join Google Cloud Innovators goo.gle/vertexai

Slide 92

Slide 92 text

Thank you! Any questions? Laurent Picard @PicardParis Presentation bit.ly/ml-for-all