Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building smarter apps with machine learning, from magic to reality

7e6a435700dda6cdb22105d601f20892?s=47 Laurent Picard
September 30, 2021

Building smarter apps with machine learning, from magic to reality

“Any sufficiently advanced technology is indistinguishable from magic.”
— Arthur C Clarke

Well, machine learning can look like magic, but you don't need to be a data scientist or an ML researcher to develop with ML.

So, what about making your solution smarter without any knowledge in AI? With pre-trained models and a few lines of code, machine learning APIs can analyze your data. Moreover, AutoML techniques can now help in getting even more specific insights tailored to your needs.

In this session, you’ll see how to transform or extract information from text, image, audio & video with the latest ML APIs, how to train an AutoML custom model, and you’ll be an active player of a live demo. Don't put your smartphone in airplane mode!

7e6a435700dda6cdb22105d601f20892?s=128

Laurent Picard

September 30, 2021
Tweet

Transcript

  1. Building smarter solutions with no expertise in machine learning Paris

    September 30, 2021 Laurent Picard @PicardParis
  2. Who are we? Laurent Picard ‒ @PicardParis Developer Advocate ‒

    Google Cloud Ebook pioneer (1999..2016) Cofounder of Bookeen Who are you? Developers? Machine learning users? Cloud users?
  3. Any sufficiently advanced technology is indistinguishable from magic — Arthur

    C. Clarke “
  4. @PicardParis #DevoxxFR What is machine learning? Data Information

  5. @PicardParis #DevoxxFR What is machine learning, for real?

  6. @PicardParis #DevoxxFR How does deep learning work? How Using many

    examples to find answers Result Solving problems without explicitly knowing the answer Origin Trying to mimic how (we think) our brain works
  7. @PicardParis #DevoxxFR Why is machine learning now possible? Theory Data

    Computing ML
  8. @PicardParis #DevoxxFR ML expertise Developer skills ML APIs Ready-to-use models

    AutoML Customized models ML Neural networks Building blocks Three ways we can benefit from ML today
  9. 01 Machine Learning APIs Ready-to-use models

  10. @PicardParis #DevoxxFR Ready-to-use models Translation API Speech-To-Text API Natural Language

    API Vision API Video Intelligence API Text-To-Speech API Image Video Text Text Speech Text Info Info Info Translation Text Speech
  11. @PicardParis #DevoxxFR Vision API Analyze images with a simple request

  12. @PicardParis #DevoxxFR Computer vision before ML Photo by Shaun Jeffers:

    hobbitontours.com Edge detection with Sobel convolution filter
  13. @PicardParis #DevoxxFR Label detection Photo by Shaun Jeffers: hobbitontours.com "labelAnnotations":

    [ { "description": "Nature", "mid": "/m/05h0n", "score": 0.9516123, }, { "description": "Flower", "mid": "/m/0c9ph5", "score": 0.91467637, }, { "description": "Garden", "mid": "/m/0bl0l", "score": 0.903375, }, … ]
  14. @PicardParis #DevoxxFR Landmark detection "landmarkAnnotations": [ { "boundingPoly": {…}, "description":

    "Hobbiton Movie Set", "locations": [ { "latLng": { "latitude": -37.8723441, "longitude": 175.6833613 } } ], "mid": "/m/012r3jqg", "score": 0.61243546 } ] Original photo by Shaun Jeffers: hobbitontours.com
  15. @PicardParis #DevoxxFR Photo by Dominic Monaghan (Instagram) Object detection "localizedObjectAnnotations":

    [ { "boundingPoly": {…}, "mid": "/m/01g317", "name": "Person", "score": 0.90216154 }, { "boundingPoly": {…}, "mid": "/m/01g317", "name": "Person", "score": 0.88069034 }, { "boundingPoly": {…}, "mid": "/m/01g317", "name": "Person", "score": 0.86947715 }, … ]
  16. @PicardParis #DevoxxFR Rendering by Elendil: www.zbrushcentral.com/printthread.php?t=45397 Face detection "faceAnnotations": [{

    "detectionConfidence": 0.93634903, "boundingPoly": {…}, "fdBoundingPoly": {…}, "landmarkingConfidence": 0.18798567, "landmarks": [{ "type": "LEFT_EYE" "position": {…}, },…], "panAngle": -1.7626401, "rollAngle": 7.024975, "tiltAngle": 9.038818, "angerLikelihood": "LIKELY", "joyLikelihood": "VERY_UNLIKELY", "sorrowLikelihood": "VERY_UNLIKELY", "surpriseLikelihood": "VERY_UNLIKELY", "headwearLikelihood": "VERY_UNLIKELY", "blurredLikelihood": "VERY_UNLIKELY", "underExposedLikelihood": "VERY_UNLIKELY" }]
  17. @PicardParis #DevoxxFR Screenshot from Goodreads: goodreads.com/quotes/4454 Text detection "fullTextAnnotation": {

    "text": " J.R.R. Tolkien > Quotes > Quotable Quote \"Three Rings for the Elven-kings under the… Seven for the Dwarf-lords in their halls of… Nine for Mortal Men, doomed to die, One for the Dark Lord on his dark throne In the Land of Mordor where the Shadows lie. One Ring to rule them all, One Ring to find… One Ring to bring them all and in the darkn… In the Land of Mordor where the Shadows lie.” - J. R. R. Tolkien, The Lord of the Rings " }
  18. @PicardParis #DevoxxFR Screenshot from Goodreads: goodreads.com/quotes/4454 Text detection "fullTextAnnotation": {

    "text": " J.R.R. Tolkien > Quotes > Quotable Quote “Three Rings for the Elven-kings under the… Seven for the Dwarf-lords in their halls of… Nine for Mortal Men, doomed to die, One for the Dark Lord on his dark throne In the Land of Mordor where the Shadows lie. One Ring to rule them all, One Ring to find… One Ring to bring them all and in the darkn… In the Land of Mordor where the Shadows lie.' - J. R. R. Tolkien, The Lord of the Rings " }
  19. @PicardParis #DevoxxFR Tolkien handwriting: pinterest.com/pin/145311525456602832 Handwriting detection "fullTextAnnotation": { "text":

    " The Lord of the Rings Three Rings for the even-kings under the sky, Seven for the Dwarf-lords in their halls of… Nine for Mortal Men doomed to die, One for the Dark Lord on his dark throne In the Land of Mordor where the shadows lie. One Ring to rule them all, One Ring to find… One Ring to bring them all and in the shadows… In the Land of Mordor where the shadows lie\" " }
  20. @PicardParis #DevoxxFR Web entity detection and image matching "webDetection": {

    "bestGuessLabels": [ { "label": "jrr tolkien", "languageCode": "es" } ], "webEntities": [ { "entityId": "/m/041h0", "score": 14.976, "description": "J. R. R. Tolkien" },… ], "partialMatchingImages": [ { "url": "http://e00-elmundo.uecdn.es/…jpg" },… ], "pagesWithMatchingImages": […], "visuallySimilarImages": […] } Photo by Bill Potter: elmundo.es/cultura/2017/08/11/598c81b6e2704ebf238b469e.html
  21. @PicardParis #DevoxxFR OSS client libraries from google.cloud import vision uri_base

    = 'gs://cloud-vision-codelab' pics = ('face_surprise.jpg', 'face_no_surprise.png') client = vision.ImageAnnotatorClient() image = vision.Image() for pic in pics: image.source.image_uri = f'{uri_base}/{pic}' response = client.face_detection(image=image) for face in response.face_annotations: likelihood = vision.Likelihood(face.surprise_likelihood) vertices = [f'({v.x},{v.y})' for v in face.bounding_poly.vertices] print(f'Face surprised: {likelihood.name}') print(f'Face bounds: {",".join(vertices)}') Python package: pypi.org/project/google-cloud-vision Tutorial code: codelabs.developers.google.com/codelabs/cloud-vision-api-python
  22. Demo – Vision API

  23. @PicardParis #DevoxxFR Video Intelligence API Analyze videos with a simple

    request
  24. @PicardParis #DevoxxFR Video Intelligence API Label Detection Detect entities within

    the video, such as "dog", "flower" or "car". Enable Video Search You can now search your video catalog the same way you search text documents. Insights from Videos Extract actionable insights from video files without requiring any machine learning or computer vision knowledge. More… Detect sequences. Detect and track objects Detect explicit content. Transcribe speech. + OCR, logo, face, person detection, pose estimation,...
  25. Demo – Video Intelligence API

  26. Demo - Video Intelligence API

  27. @PicardParis #DevoxxFR OSS client libraries from google.cloud import videointelligence from

    google.cloud.videointelligence import enums, types def track_objects(video_uri, segments=None): video_client = videointelligence.VideoIntelligenceServiceClient() features = [enums.Feature.OBJECT_TRACKING] context = types.VideoContext(segments=segments) print(f'Processing video "{video_uri}"...') operation = video_client.annotate_video(input_uri=video_uri, features=features, video_context=context) return operation.result() Python package: pypi.org/project/google-cloud-videointelligence Tutorial code: codelabs.developers.google.com/codelabs/cloud-video-intelligence-python3
  28. @PicardParis #DevoxxFR Natural Language API Analyze text with a simple

    request
  29. @PicardParis #DevoxxFR Syntax analysis Tolkien was a British writer, poet,

    philologist, and university professor who is best known as the author of the classic high-fantasy works The Hobbit, The Lord of the Rings, and The Silmarillion.
  30. @PicardParis #DevoxxFR Tolkien was a British writer, poet, philologist, and

    university professor who is best known as the author of the classic high-fantasy works The Hobbit, The Lord of the Rings, and The Silmarillion. { "language": "en" } Syntax analysis
  31. @PicardParis #DevoxxFR Entity detection Tolkien was a British writer, poet,

    philologist, and university professor who is best known as the author of the classic high-fantasy works The Hobbit, The Lord of the Rings, and The Silmarillion.
  32. @PicardParis #DevoxxFR Entity detection Tolkien was a British writer, poet,

    philologist, and university professor who is best known as the author of the classic high-fantasy works The Hobbit, The Lord of the Rings, and The Silmarillion.
  33. @PicardParis #DevoxxFR Entity detection Tolkien was a British writer, poet,

    philologist, and university professor who is best known as the author of the classic high-fantasy works The Hobbit, The Lord of the Rings, and The Silmarillion. { "name": "British", "type": "LOCATION", "metadata": { "mid": "/m/07ssc", "wikipedia_url": "https://en.wikipedia.org/wiki/United_Kingdom" } } { "name": "Tolkien", "type": "PERSON", "metadata": { "mid": "/m/041h0", "wikipedia_url": "https://en.wikipedia.org/wiki/J._R._R._Tolkien" } } { "name": "The Silmarillion", "type": "WORK_OF_ART", "metadata": { "mid": "/m/07c4l", "wikipedia_url": "https://en.wikipedia.org/wiki/The_Silmarillion" } }
  34. @PicardParis #DevoxxFR Content classification Tolkien was a British writer, poet,

    philologist, and university professor who is best known as the author of the classic high-fantasy works The Hobbit, The Lord of the Rings, and The Silmarillion. { "categories": [ { "name": "/Books & Literature", "confidence": 0.97 }, { "name": "/People & Society/Subcultures…", "confidence": 0.66 }, { "name": "/Hobbies & Leisure", "confidence": 0.58 } ] }
  35. @PicardParis #DevoxxFR Sentiment analysis 2 example reviews of “The Hobbit”:

    - Positive from the NYT (1938) - Negative from GoodReads
  36. @PicardParis #DevoxxFR OSS client libraries from google.cloud import language from

    google.cloud.language import enums, types def analyze_text_sentiment(text): client = language.LanguageServiceClient() document = types.Document(content=text, type=enums.Document.Type.PLAIN_TEXT) response = client.analyze_sentiment(document=document) sentiment = response.document_sentiment results = [('text', text), ('score', sentiment.score), ('magnitude', sentiment.magnitude)] for k, v in results: print('{:10}: {}'.format(k, v)) Python package: pypi.org/project/google-cloud-language Tutorial code: codelabs.developers.google.com/codelabs/cloud-natural-language-python3
  37. @PicardParis #DevoxxFR Translation API Translate text in 100+ languages with

    a simple request
  38. @PicardParis #DevoxxFR Translation API Translate Many Languages 100+ different languages,

    from Afrikaans to Zulu. Used in combination, this enables translation between thousands of language pairs. Language Detection Translation API can automatically identify languages with high accuracy. Simple Integration Easy to use Google REST API. No need to extract text from your document, just send it HTML documents and get back translated text. High Quality Translations High quality translations that push the boundary of Machine Translation. Updated constantly to seamlessly improve translations and introduce new languages and language pairs.
  39. @PicardParis #DevoxxFR Switch to a neural translation model in 2016

    Neural Network for Machine Translation, at Production Scale (ai.googleblog.com)
  40. @PicardParis #DevoxxFR Models match empirical studies Exploring Massively Multilingual, Massive

    Neural Machine Translation (ai.googleblog.com)
  41. @PicardParis #DevoxxFR Models keep improving over time Recent Advances in

    Google Translate (ai.googleblog.com)
  42. @PicardParis #DevoxxFR OSS client libraries from google.cloud import translate def

    translate_text(target, text): """Translates text into the target language.""" translate_client = translate.Client() # Text can also be a sequence of strings, in which case this method # will return a sequence of results for each text. result = translate_client.translate(text, target_language=target) print('Text: {}'.format(result['input'])) print('Translation: {}'.format(result['translatedText'])) print('Detected source language: {}'.format(result['detectedSourceLanguage'])) Sample from Python open source client library github.com/GoogleCloudPlatform/python-docs-samples
  43. @PicardParis #DevoxxFR Speech-to-Text API Convert speech to text in 120

    languages with a simple request
  44. @PicardParis #DevoxxFR Speech-to-Text API Speech Recognition Recognizes 120 languages &

    variants. Powered by deep learning neural networking to power your applications. Real-Time Results Can stream text results, returning partial recognition results as they become available. Can also be run on buffered or archived audio files. Noise Robustness No need for signal processing or noise cancellation before calling API. Can handle noisy audio from a variety of environments. More… Customized recognition Word timestamps Auto-punctuation … (BETA) Language auto-detection Multiple speaker detection Word-level confidence …
  45. @PicardParis #DevoxxFR Speech timestamps Search for text within your audio

    "transcript": "Hello world…", "confidence": 0.96596134, "words": [ { "startTime": "1.400s", "endTime": "1.800s", "word": "Hello" }, { "startTime": "1.800s", "endTime": "2.300s", "word": "world" }, … ]
  46. @PicardParis #DevoxxFR OSS client libraries from google.cloud import speech_v1 as

    speech def speech_to_text(config, audio): client = speech.SpeechClient() response = client.recognize(config, audio) config = {'language_code': 'fr-FR', 'enable_automatic_punctuation': True, 'enable_word_time_offsets': True} audio = {'uri': 'gs://cloud-samples-data/speech/corbeau_renard.flac'} speech_to_text(config, audio) """ Transcript: Maître corbeau sur un arbre perché tenait en son bec un fromage... Confidence: 93% """ Python package: pypi.org/project/google-cloud-speech Tutorial code: codelabs.developers.google.com/codelabs/cloud-speech-text-python3
  47. @PicardParis #DevoxxFR Text-to-Speech API Generate natural speech with a simple

    request
  48. @PicardParis #DevoxxFR WaveNet natural voices, by Deepmind https://deepmind.com/blog/wavenet-generative-model-raw-audio https://deepmind.com/blog/high-fidelity-speech-synthesis-wavenet

  49. @PicardParis #DevoxxFR Which one is the original recording?

  50. Demo – How can I help!

  51. @PicardParis #DevoxxFR OSS client libraries from google.cloud import texttospeech from

    google.cloud.texttospeech import enums, types def text_to_wav(voice_name, text): language_code = '-'.join(voice_name.split('-')[:2]) input = types.SynthesisInput(text=text) voice = types.VoiceSelectionParams(language_code=language_code, name=voice_name) audio_config = types.AudioConfig(audio_encoding=enums.AudioEncoding.LINEAR16) client = texttospeech.TextToSpeechClient() response = client.synthesize_speech(input, voice, audio_config) save_to_wav(f'{language_code}.wav', response.audio_content) text_to_wav('en-AU-Wavenet-A', 'What is the temperature in Sydney?') text_to_wav('en-GB-Wavenet-B', 'What is the temperature in London?') text_to_wav('en-IN-Wavenet-C', 'What is the temperature in Delhi?') Python package: pypi.org/project/google-cloud-texttospeech Tutorial code: codelabs.developers.google.com/codelabs/cloud-text-speech-python3
  52. 02 AutoML Build your custom model with no expertise

  53. @PicardParis #DevoxxFR Generic results with the Vision API

  54. @PicardParis #DevoxxFR More specific results? CIRRUS ALTOCUMULUS

  55. @PicardParis #DevoxxFR Cloud AutoML AutoML Train Deploy Serve Your training

    data Your custom model with a REST API Your custom edge model TF Lite mobile TF.js browser Container anywhere
  56. @PicardParis #DevoxxFR Dataset

  57. @PicardParis #DevoxxFR Training

  58. @PicardParis #DevoxxFR Evaluating

  59. @PicardParis #DevoxxFR Serving

  60. @PicardParis #DevoxxFR Auto-generate a custom model from your data AutoML

    Vision AutoML Natural Language AutoML Translation AutoML Video Intelligence AutoML Tables Image Text Text Video Structured Data Custom - Classification - Object Detection - Pix Segmentation Custom - Classification - Shot Detection - Obj. Detect./Track. Custom - Classification - Entity Extraction - Sentiment Analysis Custom Translation Custom - Classification - Metrics Prediction
  61. @PicardParis #DevoxxFR AutoML in Vertex AI Image classification (single-label) Image

    classification (multi-label) Image object detection Video classification Video action recognition Video object tracking Text classification (single-label) Text classification (multi-label) Text entity extraction Text sentiment analysis Regression/classification Image segmentation Note: AutoML Translation stays a product of its own
  62. @PicardParis #DevoxxFR What are your emotions? Ready-to-use model Vision API

    😃 Joy 😮 Surprise 😢 Sorrow 😠 Anger I want to detect faces + general emotions Custom model AutoML Vision 😛 Tongue out 🥱 Yawning 😴 Sleeping I want to detect new custom expressions
  63. Stache Club demo serverless architecture Source Selfies Cloud Storage Stache

    Club App App Engine Stache Club Selfies Cloud Storage Face Detection Vision API Custom Detection AutoML Vision Selfie Processing Cloud Functions User Web request Event trigger 1. User uploads selfie 2. Function is automatically triggered 3. Function gets insights from ML APIs 4. Function uploads result image 1 2 3 4 Admin
  64. ⁞ Demo – Stache Club

  65. @PicardParis #DevoxxFR Evaluation: results vs expectations Results returned by model

    Results we expect Results we don't expect Results not returned by model Model positives ← Model negatives ← True positives False negatives True negatives False positives
  66. @PicardParis #DevoxxFR Model precision Precision = True + True +

    False + Precision can be seen as a measure of exactness or quality. High precision means that the model returns substantially more expected results than unexpected ones.
  67. @PicardParis #DevoxxFR Model recall Recall can be seen as a

    measure of completeness or quantity. High recall means that the model returns most of the expected results. Recall = True + True + False −
  68. @PicardParis #DevoxxFR Learning to learn Models to identify optimal model

    architectures AutoML under the hood Transfer learning Build on existing models Hyperparameter auto-tuning Algorithm for finding the best hyperparameters for your model & data
  69. @PicardParis #DevoxxFR Learning to learn: neural architecture search Controller: proposes

    ML models Train & evaluate models 20K times Iterate to find the most accurate model Layers Learning rate Research paper: bit.ly/nas-paper
  70. @PicardParis #DevoxxFR Updated output using your training data Transfer learning

    Model trained on a lot of data Your data Hidden layers
  71. @PicardParis #DevoxxFR Hyperparameter tuning • Hyperparameters: any value which affects

    the accuracy of an algorithm, but is not directly learned by it • HyperTune: Google-developed algorithm to find the best hyperparameter combinations for your model • Available (preview) as a Cloud API: Vertex Vizier HyperParam #1 Objective Want to find this Not these HyperParam #2
  72. @PicardParis #DevoxxFR Work in progress with Vertex AI

  73. @PicardParis #DevoxxFR Work in progress with Document AI

  74. 03 More machine learning! From focusing on industry verticals… …to

    building neural networks
  75. @PicardParis #DevoxxFR AI platforms & industry verticals • Vertex AI

    DS+AutoML+MLOps+… • Document AI OCR+HW+Tables+Forms Invoices+Receipts+… • Dialogflow Build your chat bot • Call Center AI
  76. @PicardParis #DevoxxFR TensorFlow: favorite ML repo on GitHub 3 years

    later, ML framework consolidation: • TensorFlow • PyTorch
  77. Time to wrap up!

  78. @PicardParis #DevoxxFR How can I build smarter solutions? hours days

    weeks Time? none dataset dataset + NN + … Difficulty? ML APIs ML AutoML ML expertise Developer skills Ready-to-use models Customized models Neural networks
  79. Resources Ready-to-use machine learning models Cloud Vision API cloud.google.com/vision Cloud

    Video Intelligence API cloud.google.com/video-intelligence Cloud Natural Language API cloud.google.com/natural-language Cloud Translation API cloud.google.com/translation Cloud Speech-To-Text API cloud.google.com/speech-to-text Cloud Text-to-Speech API cloud.google.com/text-to-speech Build a custom model with your own data without any expertise Cloud AutoML cloud.google.com/automl Build your own model from scratch with deep learning expertise TensorFlow tensorflow.org Vertex AI cloud.google.com/vertex-ai
  80. @PicardParis #DevoxxFR Python resources Codelabs (g.co/codelabs) Using the Vision API

    Using the Video Intelligence API Using the Natural Language API Using the Translation API Using the Speech-to-Text API Using the Text-to-Speech API Inspirational articles (medium.com/@PicardParis) Summarizing videos in 300 lines of code Tracking video objects in 300 lines of code Face detection and processing in 300 lines of code Deploying a Python serverless function in minutes with GCP
  81. Online comic from Google AI bit.ly/ml-comic

  82. Thank you! Your feedback is welcome bit.ly/feedback-mlmagic Slides + all

    pointers bit.ly/ml-for-developers Laurent Picard @PicardParis