Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ML Kit, Practical A.I. for Mobile

ML Kit, Practical A.I. for Mobile

This talk will showcase the MLKit APIs from Google that can make iOS and Android apps more engaging and helpful with computer vision that is optimized to run on device. The APIs are built with the same Tensor Flow technology used in Google Translate and Google Photos. The example code for text recognition shows you how to code for any of the API models: text recognition, face detection, barcode scanning, image labeling, object detection, object tracking, smart reply, text translation, and language identification.

Elizabeth's talk will unveil the latest version of the RoloScan open-source, now migrated to Kotlin and moved from the Firebase APIs to the new MLKit packages.

The RoloScan source code is available on github. A compiled RoloScan apk is sold on the Play Store.

Elizabeth Mezias

September 26, 2020
Tweet

Other Decks in Programming

Transcript

  1. I am Elizabeth Mezias @bethmezias @bostonandroid GDG Boston Android Lead

    Organizer WTM Ambassador Senior Android Engineer 4
  2. Code of Conduct ★ Be nice, friendly, welcoming. Be someone

    that other people want to be around. ★ Be respectful and constructive. Everyone should take responsibility for the community and diffuse tension to stop a negative thread ASAP. ★ Be collaborative. Work together! We can learn a lot from each other. Share knowledge, and help each other out. ★ Participate! Join in the discussions, offer feedback, and help implement that feedback. https:/ /developers.google.com/community/guidelines 5
  3. Social Responsibility Fairness AI systems should treat all people fairly

    Inclusiveness AI systems should empower everyone and engage people Reliability & Safety AI systems should perform reliably and safely Transparency AI systems should be understandable Privacy & Security AI systems should be secure and respect privacy Accountability People should be accountable for AI systems 6
  4. MLKit 10 AI models MLKit in 2016, then Firebase 2020

    back to MLKit https:/ /developers.google.com/ml-kit
  5. MLKit 2020 ★ Started in play-services 3 computer vision models

    in 2016 Faces, Barcodes, and Text ★ Add in pose detection, image detection, tracking, and labeling, digital ink for handwriting ★ New Natural Language APIs identify the language, translate text and give a smart reply 8
  6. “ML Kit brings machine learning expertise to mobile in a

    powerful and easy-to-use package that makes apps more engaging, personalized, and helpful with solutions that run on device. 9
  7. Machine Learning in a box ▪ Vision and Natural Language

    models ▪ Image classification/recognition Common use for custom models ▪ You need to divide the data set ▫ Training data, examples to build the model ▫ Testing data, to grade the accuracy 10
  8. Juicy Details are found online ▪ TensorFlow.org - the project

    on GitHub is used by 95,654 repositories with 2,735+ Contributors ▪ 195 Google Public Datasets ▪ Easy to use ML in Python, C, Javascript, Java, etc. ▪ MNIST Database of handwritten digits ▪ MSCOCO - Common Objects in Context 11
  9. BERT, MobileBERT or ALBERT Recent blog on Natural Language reference

    code and pretrained models ▪ Text classification - label text from text data. ▪ Q&A: Feed the model an article, sample questions and it can answer user questions. ▪ Smart Reply: User’s context generates replies. On the TensorFlow Hub, ready for Studio or XCode 12
  10. The project build.gradle file has Google Maven. The app or

    module using the APIs has the GMS MLKit dependency play-services-mlkit-text-recognition The Meta data is set in the Manifest mlkit.vision.DEPENDENCIES = ‘ocr’ SIMPLE SETUP Include the pod 'GoogleMLKit/TextRecognition' Open the project with Xcode version 11.3.1 or later 14
  11. 1. Prepare the input image 2. Get an instance of

    TextRecognizer 3. Process the image 4. Use the blocks of recognized text 15
  12. 3 lines to OCR val imageInputStream = context.contentResolver.openInputStream(imageUri) val decodedBitmap

    = BitmapFactory.decodeStream(inputStreamForImage, null, opts) InputImage.fromBitmap(decodedBitmap, 0) Feed the InputImage object into the TextRecognizer on a background thread 16
  13. Coroutines for easy threading suspend fun scanImage(image: InputImage) { val

    client = TextRecognition.getClient() client.process(image) client.close() } SuccessListener -> :Text MLKit Text object is the hierarchical representation It’s a list of Text.TextBlock which is a list of Text.Line which is a list of Text.Element. 17
  14. Barcode Scanning Reads standard barcodes with any rotation or camera

    orientation ★ Linear formats: Codabar, Code 39, Code 93, Code 128, EAN-8, EAN-13, ITF, UPC-A, UPC-E ★ 2D formats: Aztec, Data Matrix, PDF417, QR Code Uses the same InputImage type as TextRecognition 19
  15. Barcode Scanning val scanner = BarcodeScanning.getClient() scanner.process(image) SuccessListener -> Barcode

    Barcode is a single recognized code and its value. Enums for types and expected fields for the supported format are found in it. Structured data like this: When (valueType) { Barcode.TYPE_URL -> { val title = barcode.url!!.title val url = barcode.url!!.url } 20
  16. Translate Text The Natural Language APIs work with Strings and

    digital text, no images ★ More than 50 languages ★ Models proven with Google Translate in offline mode ★ Runs fast on-device with language packs to manage space required ML Kit's models are built on translating to and from English. For other languages, English is used as an intermediate, which can affect the result quality. Language models are around 30MB 22
  17. Source/Target models = 60MB val options = TranslatorOptions.Builder() .setSourceLanguage(TranslateLanguage.ENGLISH) .setTargetLanguage(TranslateLanguage.GERMAN)

    .build() val englishGermanTranslator = Translator.getClient(options) englishGermanTranslator.downloadModelIfNeeded(...) englishGermanTranslator.translate(text) SuccessListener -> :String or NOT_FOUND MLKitException 23
  18. 24 Face API is detection not identification SuccessListener -> :Face

    MLKit finds faces in image or video as: ★ Landmarks: A single point on a face. X,Y values for the center of the eye, nose, or mouth ★ Contours: A list of points that describe a feature like the left eye, right eye, or the mouth ★ Features: Smiling, left and right eyes open
  19. Face Landmarks - X,Y float coordinates are returned by the

    API 25 LEFT_CHEEK The midpoint between the mouth corner and the outer corner of the eye. LEFT_EAR The midpoint of the left ear tip and left ear lobe. LEFT_EYE The center of the left eye cavity. MOUTH_BOTTOM The center of the bottom lip. MOUTH_LEFT The left mouth corner where the lips meet. MOUTH_RIGHT The right mouth corner where the lips meet. NOSE_BASE The midpoint between the nostrils where the nose meets the face. RIGHT_CHEEK The midpoint between the mouth corner and the outer corner of the eye. RIGHT_EAR The midpoint of the right ear tip and right ear lobe. RIGHT_EYE The center of the right eye cavity.
  20. Face Contours with a description, each a list of float

    coordinates 26 FACE The outline of the face. LEFT_CHEEK The center of the left cheek. LEFT_EYE The outline of the left eye cavity. LEFT_EYEBROW_BOTTOM The bottom outline of the left eyebrow. LEFT_EYEBROW_TOP The top outline of the subject's left eyebrow. LOWER_LIP_BOTTOM The bottom outline of the subject's lower lip. LOWER_LIP_TOP The top outline of the subject's lower lip. NOSE_BOTTOM The outline of the subject's nose bridge. NOSE_BRIDGE The outline of the subject's nose bridge. RIGHT_CHEEK The center of the right cheek. RIGHT_EYE The outline of the subject's right eye cavity. RIGHT_EYEBROW_BOTTOM The bottom outline of the subject's right eyebrow. RIGHT_EYEBROW_TOP The top outline of the subject's right eyebrow. UPPER_LIP_BOTTOM The bottom outline of the subject's upper lip. UPPER_LIP_TOP The top outline of the subject's upper lip.
  21. 27 Pose API ★ BlazePose is a new model with

    33 Landmarks ★ Model to match BlazeFace and BlazePalm ★ Popular COCO topology has only 17 Landmarks ★ Pose - one face/body tracking in video 25 point upper body Pose model ★ DensePose is another model built up from COCO to classify and describe all the people in any given image
  22. 28 Pose API ★ DetectorMode - one image or a

    video stream ★ 2 PerformanceMode options: ◦ FAST ~30 and ~45+ fps Responsive, real-time detection, speed depends on the device hardware ◦ ACCURATE ~10+ fps Slower, more accurate X, Y, and visibility
  23. 29 Pose API depends on Face landmarks SuccessListener -> :Pose

    MLKit Pose is a list of PoseLandmark points: ★ PoseLandmark: One of 33 possible defined PoseLandmark.Types ★ getInFrameLikelihood(): Accuracy float ★ getPosition(): X,Y float coordinates
  24. 30

  25. CREDITS Thanks very much to Rishit Dagli Special thanks to

    all the people who made and released these awesome resources for free: ▪ Presentation template by SlidesCarnival ▪ Photographs by Unsplash 32
  26. LINKS ★ ML Kit | Google Developers ★ Responsible AI

    principles from Microsoft ★ Responsibilities – Google AI ★ Responsible AI | TensorFlow ★ Google AI Blog: On-device, Real-time Body Pose Tracking with MediaPipe BlazePose ★ Google Cloud Platform ★ Model Card BlazePose Upper-Body (1).pdf - Google Drive ★ Home | TensorFlow Hub ★ DensePose ★ TensorFlow ★ The TensorFlow Blog ★ What’s new in TensorFlow Lite for NLP — The TensorFlow Blog 33