Discovering AI Models

Discovering AI Models Laurent Picard @PicardParis AI for Everyone! September
18, 2024

Hey! I'm Laurent! Laurent Picard ‒ @PicardParis ◦ Developer Advocate
‒ Google Cloud ◦ Applied AI, Serverless, Python Previous lives ◦ CTO, cofounder of Bookeen ◦ Ebook pioneer (17 years) ◦ Educational solutions

Any sufficiently advanced technology is indistinguishable from magic — Arthur
C. Clarke “

@PicardParis What is machine learning (for me)? Data Information

@PicardParis What is machine learning? Artificial Intelligence (make machines "intelligent")
Machine Learning (learn from data) Deep Learning (using neural networks) Generative AI (create content)

@PicardParis How does deep learning work? How Using many examples
to ﬁnd answers Result Solving problems without explicitly knowing the answer Origin Trying to mimic how (we think) our brain works

@PicardParis Why is machine learning now possible? Theory Data Computing
ML

@PicardParis Google AI Milestones ai.google/ai-milestones

@PicardParis Focus on ML Focus on Dev ML APIs Ready-to-use
models Model Tuning Customized models Machine Learning Data & neural networks Building blocks Four ways we can build with ML in 2024 Generative AI Generative models

01 Machine Learning APIs Ready-to-use models

@PicardParis Ready-to-use models Text Text Image Video Speech Text Video
Intelligence API Speech-To-Text API Vision API Natural Language API Translation API Text-To-Speech API Info Translation Info Info Text Speech

@PicardParis Generative AI Prompt → Text - Chat - Summarization
- Classiﬁcation - Extraction - Writing/ideation Image → Info - Image captioning Image+Q → Info - Visual Q & A Text → App - Search - Chat - Recommendations - Agent Prompt → Code - Code generation - Code completion - Refactoring - Lang. conversion Vertex AI Prompt → Image - Image generation + Prompt → Image - Image editing Text Image Audio Video Documents Vertex AI Agent Builder Vertex AI Studio

@PicardParis Natural Language API Extract information from text

@PicardParis Syntax analysis Tolkien was a British writer, poet, philologist,
and university professor who is best known as the author of the classic high-fantasy works The Hobbit, The Lord of the Rings, and The Silmarillion.

@PicardParis Tolkien was a British writer, poet, philologist, and university
professor who is best known as the author of the classic high-fantasy works The Hobbit, The Lord of the Rings, and The Silmarillion. { "language": "en" } Syntax analysis

@PicardParis Entity detection Tolkien was a British writer, poet, philologist,
and university professor who is best known as the author of the classic high-fantasy works The Hobbit, The Lord of the Rings, and The Silmarillion.

@PicardParis Entity detection Tolkien was a British writer, poet, philologist,
and university professor who is best known as the author of the classic high-fantasy works The Hobbit, The Lord of the Rings, and The Silmarillion. { "name": "British", "type": "LOCATION", "metadata": { "mid": "/m/07ssc", "wikipedia_url": "https://en.wikipedia.org/wiki/United_Kingdom" } } { "name": "Tolkien", "type": "PERSON", "metadata": { "mid": "/m/041h0", "wikipedia_url": "https://en.wikipedia.org/wiki/J._R._R._Tolkien" } } { "name": "The Silmarillion", "type": "WORK_OF_ART", "metadata": { "mid": "/m/07c4l", "wikipedia_url": "https://en.wikipedia.org/wiki/The_Silmarillion" } }

@PicardParis Content classification Tolkien was a British writer, poet, philologist,
and university professor who is best known as the author of the classic high-fantasy works The Hobbit, The Lord of the Rings, and The Silmarillion. { "categories": [ { "name": "/Books & Literature", "confidence": 0.97 }, { "name": "/People & Society/Subcultures…", "confidence": 0.66 }, { "name": "/Hobbies & Leisure", "confidence": 0.58 } ] }

@PicardParis Sentiment analysis 2 example reviews of “The Hobbit”: -
Positive from the NYT (1938) - Negative from GoodReads

@PicardParis Text moderation Detect sensitive or harmful content by scoring
16 categories.

@PicardParis Client libraries from google.cloud import language from google.cloud.language import
enums, types def analyze_text_sentiment(text): client = language.LanguageServiceClient() document = types.Document(content=text, type=enums.Document.Type.PLAIN_TEXT) response = client.analyze_sentiment(document=document) sentiment = response.document_sentiment results = [('text', text), ('score', sentiment.score), ('magnitude', sentiment.magnitude)] for k, v in results: print('{:10}: {}'.format(k, v)) Python package: pypi.org/project/google-cloud-language Tutorial code: codelabs.developers.google.com/codelabs/cloud-natural-language-python3

@PicardParis Translation API Translate text in 100+ languages

@PicardParis Translation API Translate Many Languages 100+ different languages, from
Afrikaans to Zulu. Used in combination, this enables translation between thousands of language pairs. Language Detection Translation API can automatically identify languages with high accuracy. Simple Integration Easy to use Google REST API. No need to extract text from your document, just send it HTML documents and get back translated text. High Quality Translations High quality translations that push the boundary of Machine Translation. Updated constantly to seamlessly improve translations and introduce new languages and language pairs.

@PicardParis Switch to a neural translation model in 2016 Neural
Network for Machine Translation, at Production Scale (ai.googleblog.com)

@PicardParis Models match empirical studies Exploring Massively Multilingual, Massive Neural
Machine Translation (ai.googleblog.com)

@PicardParis Models keep improving over time Recent Advances in Google
Translate (ai.googleblog.com)

@PicardParis Client libraries from google.cloud import translate def translate_text(target, text):
"""Translates text into the target language.""" translate_client = translate.Client() # Text can also be a sequence of strings, in which case this method # will return a sequence of results for each text. result = translate_client.translate(text, target_language=target) print('Text: {}'.format(result['input'])) print('Translation: {}'.format(result['translatedText'])) print('Detected source language: {}'.format(result['detectedSourceLanguage'])) Sample from Python open source client library github.com/GoogleCloudPlatform/python-docs-samples

@PicardParis Vision API Extract information from images

@PicardParis Computer vision before ML Photo by Shaun Jeffers: hobbitontours.com
Edge detection with Sobel convolution ﬁlter

@PicardParis Label detection Photo by Shaun Jeffers: hobbitontours.com "labelAnnotations": [
{ "description": "Nature", "mid": "/m/05h0n", "score": 0.9516123, }, { "description": "Flower", "mid": "/m/0c9ph5", "score": 0.91467637, }, { "description": "Garden", "mid": "/m/0bl0l", "score": 0.903375, }, … ]

@PicardParis Photo by Dominic Monaghan (Instagram) Object detection "localizedObjectAnnotations": [
{ "boundingPoly": {…}, "mid": "/m/01g317", "name": "Person", "score": 0.90216154 }, { "boundingPoly": {…}, "mid": "/m/01g317", "name": "Person", "score": 0.88069034 }, { "boundingPoly": {…}, "mid": "/m/01g317", "name": "Person", "score": 0.86947715 }, … ]

@PicardParis Rendering by Elendil: www.zbrushcentral.com/printthread.php?t=45397 Face detection "faceAnnotations": [{ "detectionConfidence":
0.93634903, "boundingPoly": {…}, "fdBoundingPoly": {…}, "landmarkingConfidence": 0.18798567, "landmarks": [{ "type": "LEFT_EYE" "position": {…}, },…], "panAngle": -1.7626401, "rollAngle": 7.024975, "tiltAngle": 9.038818, "angerLikelihood": "LIKELY", "joyLikelihood": "VERY_UNLIKELY", "sorrowLikelihood": "VERY_UNLIKELY", "surpriseLikelihood": "VERY_UNLIKELY", "headwearLikelihood": "VERY_UNLIKELY", "blurredLikelihood": "VERY_UNLIKELY", "underExposedLikelihood": "VERY_UNLIKELY" }]

@PicardParis Screenshot from Goodreads: goodreads.com/quotes/4454 Text detection "fullTextAnnotation": { "text":
" J.R.R. Tolkien > Quotes > Quotable Quote \"Three Rings for the Elven-kings under the… Seven for the Dwarf-lords in their halls of… Nine for Mortal Men, doomed to die, One for the Dark Lord on his dark throne In the Land of Mordor where the Shadows lie. One Ring to rule them all, One Ring to find… One Ring to bring them all and in the darkn… In the Land of Mordor where the Shadows lie.\" - J. R. R. Tolkien, The Lord of the Rings " }

@PicardParis Tolkien handwriting: pinterest.com/pin/145311525456602832 Handwriting detection "fullTextAnnotation": { "text": "
The Lord of the Rings. Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of… Nine for Mortal Men doomed to die, One for the Dark Lord on his dark throne In the Land of Mordor where the shadows lie. One Ring to rule them all, One Ring to find… One Ring to bring them all and in the shadows… In the Land of Mordor where the shadows lie\". " }

@PicardParis Landmark detection "landmarkAnnotations": [ { "boundingPoly": {…}, "description": "Hobbiton
Movie Set", "locations": [ { "latLng": { "latitude": -37.8723441, "longitude": 175.6833613 } } ], "mid": "/m/012r3jqg", "score": 0.61243546 } ] Original photo by Shaun Jeffers: hobbitontours.com

@PicardParis Web entity detection and image matching "webDetection": { "bestGuessLabels":
[ { "label": "jrr tolkien", "languageCode": "es" } ], "webEntities": [ { "entityId": "/m/041h0", "score": 14.976, "description": "J. R. R. Tolkien" },… ], "partialMatchingImages": [ { "url": "http://e00-elmundo.uecdn.es/…jpg" },… ], "pagesWithMatchingImages": […], "visuallySimilarImages": […] } Photo by Bill Potter: elmundo.es/cultura/2017/08/11/598c81b6e2704ebf238b469e.html

@PicardParis Client libraries from google.cloud import vision uri_base = 'gs://cloud-vision-codelab'
pics = ('face_surprise.jpg', 'face_no_surprise.png') client = vision.ImageAnnotatorClient() image = vision.Image() for pic in pics: image.source.image_uri = f'{uri_base}/{pic}' response = client.face_detection(image=image) for face in response.face_annotations: likelihood = vision.Likelihood(face.surprise_likelihood) vertices = [f'({v.x},{v.y})' for v in face.bounding_poly.vertices] print(f'Face surprised: {likelihood.name}') print(f'Face bounds: {",".join(vertices)}') Python package: pypi.org/project/google-cloud-vision Tutorial code: codelabs.developers.google.com/codelabs/cloud-vision-api-python

Demo – Vision API

@PicardParis Video Intelligence API Extract information from videos

@PicardParis Video Intelligence API Label Detection Detect entities within the
video, such as "dog", "ﬂower" or "car". Enable Video Search Search your video catalog the same way you search text documents. Insights from Videos Extract actionable insights from video ﬁles without requiring any machine learning or computer vision knowledge. More… Detect sequences Detect and track objects Detect explicit content Transcribe speech + OCR, logo, face, person detection, pose estimation…

Demo - Video Intelligence API

Demo – Video Intelligence API

@PicardParis Client libraries from google.cloud import videointelligence from google.cloud.videointelligence import
enums, types def track_objects(video_uri, segments=None): video_client = videointelligence.VideoIntelligenceServiceClient() features = [enums.Feature.OBJECT_TRACKING] context = types.VideoContext(segments=segments) print(f'Processing video "{video_uri}"...') operation = video_client.annotate_video(input_uri=video_uri, features=features, video_context=context) return operation.result() Python package: pypi.org/project/google-cloud-videointelligence Tutorial code: codelabs.developers.google.com/codelabs/cloud-video-intelligence-python3

@PicardParis Speech-to-Text API Convert speech to text in 125 languages

@PicardParis Speech-to-Text API Speech Recognition Recognizes 125 languages & variants.
Powered by deep learning neural networking to power your applications. Real-Time Results Can stream text results, returning partial recognition results as they become available. Can also be run on buffered or archived audio files. Noise Robustness No need for signal processing or noise cancellation before calling API. Can handle noisy audio from a variety of environments. More… Customized recognition Word timestamps Auto-punctuation Profanity filter Spoken punctuation Spoken emojis … (Preview) Language auto-detection Multiple speaker detection Word-level confidence …

@PicardParis Speech timestamps Search for text within your audio "transcript":
"Hello world…", "confidence": 0.96596134, "words": [ { "startTime": "1.400s", "endTime": "1.800s", "word": "Hello" }, { "startTime": "1.800s", "endTime": "2.300s", "word": "world" }, … ]

@PicardParis Client libraries from google.cloud import speech_v1 as speech def
speech_to_text(config, audio): client = speech.SpeechClient() response = client.recognize(config, audio) config = {'language_code': 'fr-FR', 'enable_automatic_punctuation': True, 'enable_word_time_offsets': True} audio = {'uri': 'gs://cloud-samples-data/speech/corbeau_renard.flac'} speech_to_text(config, audio) """ Transcript: Maître corbeau sur un arbre perché tenait en son bec un fromage... Confidence: 93% """ Python package: pypi.org/project/google-cloud-speech Tutorial code: codelabs.developers.google.com/codelabs/cloud-speech-text-python3

@PicardParis Text-to-Speech API Generate natural speech

@PicardParis WaveNet natural voices, by Deepmind https://deepmind.com/blog/wavenet-generative-model-raw-audio https://deepmind.com/blog/high-ﬁdelity-speech-synthesis-wavenet

@PicardParis Which one is the original recording?

Demo – Live Search & Response

@PicardParis Client libraries from google.cloud import texttospeech from google.cloud.texttospeech import
enums, types def text_to_wav(voice_name, text): language_code = "-".join(voice_name.split("-")[:2]) input = types.SynthesisInput(text=text) voice = types.VoiceSelectionParams(language_code=language_code, name=voice_name) audio_config = types.AudioConfig(audio_encoding=enums.AudioEncoding.LINEAR16) client = texttospeech.TextToSpeechClient() response = client.synthesize_speech(input, voice, audio_config) save_to_wav(f"{language_code}.wav", response.audio_content) text_to_wav("en-AU-Wavenet-A", "What is the temperature in Sydney?") text_to_wav("en-GB-Wavenet-B", "What is the temperature in London?") text_to_wav("en-IN-Wavenet-C", "What is the temperature in Delhi?") Python package: pypi.org/project/google-cloud-texttospeech Tutorial code: codelabs.developers.google.com/codelabs/cloud-text-speech-python3

@PicardParis Prompt gallery in Vertex AI

Demo – Prompt gallery

02 Model Tuning (& AutoML) Build your custom model with
no expertise

@PicardParis Generic results with the Vision API

@PicardParis More specific results? CIRRUS ALTOCUMULUS

@PicardParis AutoML AutoML Train Deploy Serve Your training data Your
custom model with a REST API Your custom edge model TF Lite mobile TF.js browser Container anywhere

@PicardParis Dataset

@PicardParis Training

@PicardParis Evaluating

@PicardParis Serving

@PicardParis Auto-generate a custom model from your data Image Text
Text Video Structured Data AutoML Vision AutoML Natural Language AutoML Translation AutoML Video Intelligence AutoML Tables Custom - Classification - Object Detection - Pix Segmentation Custom - Classification - Shot Detection - Obj. Detect./Track. Custom - Classification - Entity Extraction - Sentiment Analysis Custom Translation Custom - Classification - Metrics Prediction

@PicardParis Text classification (single-label) Text classification (multi-label) Text entity extraction
Text sentiment analysis AutoML in Vertex AI Datasets Image classification (single-label) Image classification (multi-label) Image object detection Video classification Video action recognition Video object tracking Regression/classification Image segmentation + Custom translation models with AutoML Translation Forecasting

@PicardParis Model tuning in Vertex AI Studio

@PicardParis What are your emotions? Ready-to-use model Vision API 😃
Joy 😮 Surprise 😢 Sorrow 😠 Anger I want to detect faces + general emotions Custom model AutoML Vision 😛 Tongue out 🥱 Yawning 😴 Sleeping I want to detect new custom expressions

Stache Club demo serverless architecture Source Selfies Cloud Storage Stache
Club App App Engine Stache Club Selfies Cloud Storage Face Detection Vision API Custom Detection AutoML Vision Selfie Processing Cloud Functions User Web request Event trigger 1. Upload a selfie 2. Function is automatically triggered 3. Function gets insights from ML APIs 4. Function uploads result image 1 2 3 4 Admin

⁞ Demo

@PicardParis Evaluation: results vs expectations Results returned by model Results
we expect Results we don't expect Results not returned by model Model positives ← Model negatives ← True positives False negatives True negatives False positives

@PicardParis Model precision Precision = True + True + False
+ Precision can be seen as a measure of exactness or quality. High precision means that the model returns substantially more expected results than unexpected ones.

@PicardParis Model recall Recall can be seen as a measure
of completeness or quantity. High recall means that the model returns most of the expected results. Recall = True + True + False −

@PicardParis Learning to learn Models to identify optimal model architectures
AutoML under the hood Transfer learning Build on existing models Hyperparameter auto-tuning Algorithm for ﬁnding the best hyperparameters for your model & data

@PicardParis Learning to learn: neural architecture search Controller: proposes ML
models Train & evaluate models 20K times Iterate to find the most accurate model Layers Learning rate Research paper: bit.ly/nas-paper

@PicardParis Updated output using your training data Transfer learning Model
trained on a lot of data Your data Hidden layers

@PicardParis Hyperparameter tuning • Hyperparameters: any value which affects the
accuracy of an algorithm, but is not directly learned by it • HyperTune: Google-developed algorithm to find the best hyperparameter combinations for your model • Available as a Cloud API: Vertex AI Vizier HyperParam #1 Objective Want to find this Not these HyperParam #2

@PicardParis Object detection prototype

03 More machine learning! From focusing on industry verticals… …to
building neural networks

@PicardParis AI platforms & industry verticals • Vertex AI DS+AutoML+MLOps+…
• Document AI OCR+HW+Tables+Forms Invoices+Receipts+… • Dialogflow Build your chat bot • Call Center AI

@PicardParis Form automation prototype with Document AI

@PicardParis Identity form autofiller with Document AI

Time to wrap up!

@PicardParis Focus on ML Focus on Dev ML APIs Ready-to-use
models Model Tuning Customized models Machine Learning Data & neural networks How fast & easy is it to build a prototype? Generative AI Generative models hours days days, weeks… Time? hours none dataset dataset + NN + … Diﬃculty? prompt

Resources Ready-to-use machine learning models Cloud Vision API cloud.google.com/vision Cloud
Video Intelligence API cloud.google.com/video-intelligence Cloud Natural Language API cloud.google.com/natural-language Cloud Translation API cloud.google.com/translation Cloud Speech-To-Text API cloud.google.com/speech-to-text Cloud Text-to-Speech API cloud.google.com/text-to-speech Use, customize, and deploy generative models Generative AI Studio cloud.google.com/generative-ai-studio Build your custom model with your own data without any expertise Cloud AutoML cloud.google.com/automl Build your model from scratch with deep learning expertise Vertex AI cloud.google.com/vertex-ai Extract structured information from documents Document AI cloud.google.com/document-ai

@PicardParis Python resources & articles Codelabs (g.co/codelabs) Using the Vision
API Using the Video Intelligence API Using the Natural Language API Using the Translation API Using the Speech-to-Text API Using the Text-to-Speech API Inspirational articles (medium.com/@PicardParis) Summarizing videos in 300 lines of code Tracking video objects in 300 lines of code Face detection and processing in 300 lines of code Deploy a coloring page generator in minutes From pixels to information with Document AI Automate identity document processing Moderating text with the Natural Language API Deploying a Python serverless function in minutes

bit.ly/ml-comic Google AI Online Comic

How do LLMs work? ig.ft.com/generative-ai

Generative AI sample repository github.com/GoogleCloudPlatform/generative-ai

Generative AI in education …/gemini/use-cases/education/use_cases_for_education.ipynb

Join Google Cloud Innovators goo.gle/vertexai

Thank you! Any questions? Laurent Picard @PicardParis Presentation bit.ly/ml-for-all

Discovering AI Models

Discovering AI Models

More Decks by Laurent Picard

Other Decks in Technology

Featured

Transcript