ML Kit, Practical A.I. for Mobile

MLKit Compound V Superpower Apps

Welcome - thank you for coming! 2

https:/ /www.womentechmakers.com/ 3

I am Elizabeth Mezias @bethmezias @bostonandroid GDG Boston Android Lead
Organizer WTM Ambassador Senior Android Engineer 4

Code of Conduct ★ Be nice, friendly, welcoming. Be someone
that other people want to be around. ★ Be respectful and constructive. Everyone should take responsibility for the community and diffuse tension to stop a negative thread ASAP. ★ Be collaborative. Work together! We can learn a lot from each other. Share knowledge, and help each other out. ★ Participate! Join in the discussions, offer feedback, and help implement that feedback. https:/ /developers.google.com/community/guidelines 5

Social Responsibility Fairness AI systems should treat all people fairly
Inclusiveness AI systems should empower everyone and engage people Reliability & Safety AI systems should perform reliably and safely Transparency AI systems should be understandable Privacy & Security AI systems should be secure and respect privacy Accountability People should be accountable for AI systems 6

MLKit 10 AI models MLKit in 2016, then Firebase 2020
back to MLKit https:/ /developers.google.com/ml-kit

MLKit 2020 ★ Started in play-services 3 computer vision models
in 2016 Faces, Barcodes, and Text ★ Add in pose detection, image detection, tracking, and labeling, digital ink for handwriting ★ New Natural Language APIs identify the language, translate text and give a smart reply 8

“ML Kit brings machine learning expertise to mobile in a
powerful and easy-to-use package that makes apps more engaging, personalized, and helpful with solutions that run on device. 9

Machine Learning in a box ▪ Vision and Natural Language
models ▪ Image classification/recognition Common use for custom models ▪ You need to divide the data set ▫ Training data, examples to build the model ▫ Testing data, to grade the accuracy 10

Juicy Details are found online ▪ TensorFlow.org - the project
on GitHub is used by 95,654 repositories with 2,735+ Contributors ▪ 195 Google Public Datasets ▪ Easy to use ML in Python, C, Javascript, Java, etc. ▪ MNIST Database of handwritten digits ▪ MSCOCO - Common Objects in Context 11

BERT, MobileBERT or ALBERT Recent blog on Natural Language reference
code and pretrained models ▪ Text classification - label text from text data. ▪ Q&A: Feed the model an article, sample questions and it can answer user questions. ▪ Smart Reply: User’s context generates replies. On the TensorFlow Hub, ready for Studio or XCode 12

Roloscan Open Source app on Play github.com/emezias/RoloScan

The project build.gradle file has Google Maven. The app or
module using the APIs has the GMS MLKit dependency play-services-mlkit-text-recognition The Meta data is set in the Manifest mlkit.vision.DEPENDENCIES = ‘ocr’ SIMPLE SETUP Include the pod 'GoogleMLKit/TextRecognition' Open the project with Xcode version 11.3.1 or later 14

1. Prepare the input image 2. Get an instance of
TextRecognizer 3. Process the image 4. Use the blocks of recognized text 15

3 lines to OCR val imageInputStream = context.contentResolver.openInputStream(imageUri) val decodedBitmap
= BitmapFactory.decodeStream(inputStreamForImage, null, opts) InputImage.fromBitmap(decodedBitmap, 0) Feed the InputImage object into the TextRecognizer on a background thread 16

Coroutines for easy threading suspend fun scanImage(image: InputImage) { val
client = TextRecognition.getClient() client.process(image) client.close() } SuccessListener -> :Text MLKit Text object is the hierarchical representation It’s a list of Text.TextBlock which is a list of Text.Line which is a list of Text.Element. 17

What else is in the Kit? 18

Barcode Scanning Reads standard barcodes with any rotation or camera
orientation ★ Linear formats: Codabar, Code 39, Code 93, Code 128, EAN-8, EAN-13, ITF, UPC-A, UPC-E ★ 2D formats: Aztec, Data Matrix, PDF417, QR Code Uses the same InputImage type as TextRecognition 19

Barcode Scanning val scanner = BarcodeScanning.getClient() scanner.process(image) SuccessListener -> Barcode
Barcode is a single recognized code and its value. Enums for types and expected fields for the supported format are found in it. Structured data like this: When (valueType) { Barcode.TYPE_URL -> { val title = barcode.url!!.title val url = barcode.url!!.url } 20

21 https:/ /developers.google.com/ml-kit/language/translation/translation-language-support

Translate Text The Natural Language APIs work with Strings and
digital text, no images ★ More than 50 languages ★ Models proven with Google Translate in offline mode ★ Runs fast on-device with language packs to manage space required ML Kit's models are built on translating to and from English. For other languages, English is used as an intermediate, which can affect the result quality. Language models are around 30MB 22

Source/Target models = 60MB val options = TranslatorOptions.Builder() .setSourceLanguage(TranslateLanguage.ENGLISH) .setTargetLanguage(TranslateLanguage.GERMAN)
.build() val englishGermanTranslator = Translator.getClient(options) englishGermanTranslator.downloadModelIfNeeded(...) englishGermanTranslator.translate(text) SuccessListener -> :String or NOT_FOUND MLKitException 23

24 Face API is detection not identiﬁcation SuccessListener -> :Face
MLKit finds faces in image or video as: ★ Landmarks: A single point on a face. X,Y values for the center of the eye, nose, or mouth ★ Contours: A list of points that describe a feature like the left eye, right eye, or the mouth ★ Features: Smiling, left and right eyes open

Face Landmarks - X,Y float coordinates are returned by the
API 25 LEFT_CHEEK The midpoint between the mouth corner and the outer corner of the eye. LEFT_EAR The midpoint of the left ear tip and left ear lobe. LEFT_EYE The center of the left eye cavity. MOUTH_BOTTOM The center of the bottom lip. MOUTH_LEFT The left mouth corner where the lips meet. MOUTH_RIGHT The right mouth corner where the lips meet. NOSE_BASE The midpoint between the nostrils where the nose meets the face. RIGHT_CHEEK The midpoint between the mouth corner and the outer corner of the eye. RIGHT_EAR The midpoint of the right ear tip and right ear lobe. RIGHT_EYE The center of the right eye cavity.

Face Contours with a description, each a list of float
coordinates 26 FACE The outline of the face. LEFT_CHEEK The center of the left cheek. LEFT_EYE The outline of the left eye cavity. LEFT_EYEBROW_BOTTOM The bottom outline of the left eyebrow. LEFT_EYEBROW_TOP The top outline of the subject's left eyebrow. LOWER_LIP_BOTTOM The bottom outline of the subject's lower lip. LOWER_LIP_TOP The top outline of the subject's lower lip. NOSE_BOTTOM The outline of the subject's nose bridge. NOSE_BRIDGE The outline of the subject's nose bridge. RIGHT_CHEEK The center of the right cheek. RIGHT_EYE The outline of the subject's right eye cavity. RIGHT_EYEBROW_BOTTOM The bottom outline of the subject's right eyebrow. RIGHT_EYEBROW_TOP The top outline of the subject's right eyebrow. UPPER_LIP_BOTTOM The bottom outline of the subject's upper lip. UPPER_LIP_TOP The top outline of the subject's upper lip.

27 Pose API ★ BlazePose is a new model with
33 Landmarks ★ Model to match BlazeFace and BlazePalm ★ Popular COCO topology has only 17 Landmarks ★ Pose - one face/body tracking in video 25 point upper body Pose model ★ DensePose is another model built up from COCO to classify and describe all the people in any given image

28 Pose API ★ DetectorMode - one image or a
video stream ★ 2 PerformanceMode options: ◦ FAST ~30 and ~45+ fps Responsive, real-time detection, speed depends on the device hardware ◦ ACCURATE ~10+ fps Slower, more accurate X, Y, and visibility

29 Pose API depends on Face landmarks SuccessListener -> :Pose
MLKit Pose is a list of PoseLandmark points: ★ PoseLandmark: One of 33 possible defined PoseLandmark.Types ★ getInFrameLikelihood(): Accuracy float ★ getPosition(): X,Y float coordinates

THANKS! Any questions? You can find me at: @bethmezias [email protected]
31

CREDITS Thanks very much to Rishit Dagli Special thanks to
all the people who made and released these awesome resources for free: ▪ Presentation template by SlidesCarnival ▪ Photographs by Unsplash 32

LINKS ★ ML Kit | Google Developers ★ Responsible AI
principles from Microsoft ★ Responsibilities – Google AI ★ Responsible AI | TensorFlow ★ Google AI Blog: On-device, Real-time Body Pose Tracking with MediaPipe BlazePose ★ Google Cloud Platform ★ Model Card BlazePose Upper-Body (1).pdf - Google Drive ★ Home | TensorFlow Hub ★ DensePose ★ TensorFlow ★ The TensorFlow Blog ★ What’s new in TensorFlow Lite for NLP — The TensorFlow Blog 33

ML Kit, Practical A.I. for Mobile

ML Kit, Practical A.I. for Mobile

Elizabeth Mezias

Other Decks in Programming

Featured

Transcript

MLKit Compound V Superpower Apps

Welcome - thank you for coming! 2

https:/ /www.womentechmakers.com/ 3

I am Elizabeth Mezias @bethmezias @bostonandroid GDG Boston Android Lead

Code of Conduct ★ Be nice, friendly, welcoming. Be someone

Social Responsibility Fairness AI systems should treat all people fairly

MLKit 10 AI models MLKit in 2016, then Firebase 2020

MLKit 2020 ★ Started in play-services 3 computer vision models

“ML Kit brings machine learning expertise to mobile in a

Machine Learning in a box ▪ Vision and Natural Language

Juicy Details are found online ▪ TensorFlow.org - the project

BERT, MobileBERT or ALBERT Recent blog on Natural Language reference

Roloscan Open Source app on Play github.com/emezias/RoloScan

The project build.gradle file has Google Maven. The app or

1. Prepare the input image 2. Get an instance of

3 lines to OCR val imageInputStream = context.contentResolver.openInputStream(imageUri) val decodedBitmap

Coroutines for easy threading suspend fun scanImage(image: InputImage) { val

What else is in the Kit? 18

Barcode Scanning Reads standard barcodes with any rotation or camera

Barcode Scanning val scanner = BarcodeScanning.getClient() scanner.process(image) SuccessListener -> Barcode

21 https:/ /developers.google.com/ml-kit/language/translation/translation-language-support

Translate Text The Natural Language APIs work with Strings and

Source/Target models = 60MB val options = TranslatorOptions.Builder() .setSourceLanguage(TranslateLanguage.ENGLISH) .setTargetLanguage(TranslateLanguage.GERMAN)

24 Face API is detection not identiﬁcation SuccessListener -> :Face

Face Landmarks - X,Y float coordinates are returned by the

Face Contours with a description, each a list of float

27 Pose API ★ BlazePose is a new model with

28 Pose API ★ DetectorMode - one image or a

29 Pose API depends on Face landmarks SuccessListener -> :Pose

30

THANKS! Any questions? You can find me at: @bethmezias [email protected]

CREDITS Thanks very much to Rishit Dagli Special thanks to

LINKS ★ ML Kit | Google Developers ★ Responsible AI